I'm sure you all agree

that machine learning is one of the hottest Trend in today's

market right Gartner predicts that by 2022 there

would be at least 40% of new application development

project going on in the market that would be requiring

machine learning co-developers on their team. It's expected that these project

will generate a revenue of around three point

nine trillion dollar, isn't it cute so

looking at the huge? Upcoming demand of machine

learning around the world. We guys at Eureka have come up and designed a well-structured

machine learning full course for you guys. But before we actually

drill down over there, let me just introduce myself. Hello all I am Atul from Edureka. And today I'll be guiding you through this entire

machine learning course. Well, this course

has been designed in a way that you get the most out of it. So we'll slowly

and gradually start with a beginner level and then

move towards the advanced topic.

So without delaying any further, let's start with the agenda

of today's Action on machine learning

course has been segregated into six different module

will start our first module with introduction to

machine learning here. We'll discuss things. Like what exactly

is machine learning how it differs from artificial

intelligence and the planning what is various types

or dead space application and finally we'll end

up first module with a basic demo and python. Okay a second module

focuses on starts and probability here

will cover things like descriptive statistics and

inferential statistics to Bob. Rarity Theory and so on our third module

is unsupervised learning. Well supervised learning is one

of a type of machine learning which focuses mainly on regression and

classification type of problem.

It deals with label data sets

and the algorithm which are a part of it

are linear regression logistic regression Napier's

random Forest decision tree and so on. Our fourth module is

on unsupervised learning. Well this module focuses

mainly on dealing with unlabeled data sets and the algorithm

which are a part. Offered or k-means algorithm and a priori algorithm as

a part of fifth module. We have reinforcement

learning here. We are going to discuss

about reinforcement learning and depth on also about Q learning algorithm

finally in the end.

It's all about to make

you industry ready. Okay. So here we are going to discuss

about three different projects which are based

on supervised learning and unsupervised learning and reinforcement learning

finally in the end. I tell you about some

of the skills that you need to become

a machine learnings and Jean. Nia okay, and also I

am discussing about some of the important questions that are asked in a

machine-learning interview fine with this we come

to the end of this agenda before you move ahead don't forget to subscribe

to a dareka and press the Bell icon to never miss

any update from us. Hello everyone. This is a toll from Eureka and welcome to today's session

on what is machine learning. As you know, we are living

in a world of humans and machines humans

have been evolving and learning from the past

experience since millions of years on the other hand

the era of machines and robots have just

begun in today's world. These machines are

the rewards are like they need to be program before they actually

follow your instructions.

But what if the machine

started to learn on their own and this is where machine learning comes into picture machine

learning is the core of many futuristic technology

advancement in our world. And today you can

see various examples or implementation of

machine learning around us such as Tesla's self-driving

car Apple Siri, Sophia. I do bot and many

more are there. So what exactly

is machine learning? Well Machine learning

is a subfield of artificial intelligence that focuses on

the design of system that can learn from

and make decisions and predictions based

on the experience which is data in the case

of machines machine learning enables computer to act and make data-driven

decisions rather than Being explicitly programmed to carry out a certain

task these programs are designed to learn and improve over time when exposed to new data. Let's move on and discuss one of the biggest confusion

of the people in the world. They think that all

the three of them the AI the machine learning and

the Deep learning all are same, you know, what they are wrong.

Let me clarify things for you artificial intelligence

is a broader concept of machines being able to carry

out tasks in a smarter way. It covers anything which enables

the computer to be. Have like humans think of a

famous Turing test to determine whether a computer is

capable of thinking like a human being or not. If you are talking

to Siri on your phone and you get an answer you're

already very close to it. So this was about the artificial

intelligence now coming to the machine learning part. So as I already said

machine learning is a subset or a current application

of AI it is based on the idea that we should be able

to give machine the access to data and let them learn

from done cells.

It's a subset

of artificial intelligence. Is that deals with the extraction

of pattern from data set? This means that the machine

can not only find the rules for optimal Behavior, but also can adapt

to the changes in the world many of the algorithms

involved have been known for decades centuries

even thanks to the advances in the computer science

and parallel Computing. They can now scale up

to massive data volumes. So this was about the machine

learning part now coming over to deep learning deep learning

is a subset of machine learning where similar machine learning. Tamar used to train

deep neural network. So as to achieve better

accuracy in those cases where former was not performing

up to the mark, right? I hope now you understood

that machine learning Ai and deep learning

all three are different.

Okay moving on ahead. Let's see in general

how a machine learning work. One of the approaches is where the machine learning

algorithm is strained using a labeled

or unlabeled training data set to produce a model new input data is introduced to

the machine learning algorithm and it make prediction

based on the model. The prediction is

evaluated for accuracy. And if the accuracy

is acceptable the machine learning algorithm is deployed. Now if the accuracy

is not acceptable the machine learning

algorithm is strained again, and again with an argument

a training data set. This was just

in high-level example as they are many more factor

and other steps involved in it.

Now, let's move on

and subcategorize the Machine learning into three different

types the supervised learning and unsupervised learning

and reinforcement learning and let's see what each

of them are how they work. Work and how each of them is used in the field

of banking Healthcare retail and other domains. Don't worry. I'll make sure that I use enough examples

and implementation of all three of them to give you

a proper understanding of it. So starting with

supervised learning. What is it? So let's see

a mathematical definition of supervised learning

supervised learning is where you have input variables X

and an output variable Y and you use an algorithm

to learn the mapping function from the input to the output. That is y Affects the goal is to approximate

the mapping function. So well that whenever

you have a new input data X you could predict

the output variable.

That is why

for that data, right? I think this

was confusing for you. Let me simplify the definition

of supervised learning so we can rephrase

the understanding of the mathematical definition

as a machine learning method where each instances of

a training data set is composed of different input attribute and an expected output

the input attributes of a training data set can be

of any End of data it can be a pixel of the image.

It can be a value

of a data base row or it can even be an audio

frequency histogram right for each input instance and expected output values Associated value can be discreet

representing a category or can be a real or continuous

value in either case. The algorithm learns

the input pattern that generate the

expected output now once the algorithm is strain, it can be used to predict

the correct output of a never seen input. You can see I image

on your screen right in this image. And see that we are feeding

raw inputs as image of Apple to the algorithm as a part

of the algorithm. We have a supervisor

who keeps on correcting the machine or who keeps

on training the machine. It keeps on telling him

that yes, it is a Apple.

No, it is not an apple

things like that. So this process keeps on repeating until we

get a final train model. Once the model is ready. It can easily predict

the correct output of a never seen input

in this slide. You can see that we are giving an image

of a green apple to the machine and the Machine can easily

identify it as yes, it is an apple and it is giving

the correct result right? Let me make things

more clearer to you. Let's discuss another

example of it. So in this Slide, the image shows an example of a supervised learning process

used to produce a model which is capable of recognizing

the ducks in the image. The training data set

is composed of labeled picture of ducks and non Ducks. The result of supervised

learning process is a predictor model which is capable

of associating a label duck. Or not duck to the new image

presented to the model.

Now one strain, the resulting predictive

model can be deployed to the production environment. You can see a mobile app. For example once deployed

it is ready to recognize the new pictures right now. You might be wondering

why this category of machine learning is named

as supervised learning. Well, it is called

a supervised learning because the process

of an algorithm learning from the training data

set can be thought of as a teacher supervising

the learning process if we know the correct answers.

I will go Rhythm

iteratively makes while predicting on

the training data and is corrected by

the teacher the learning stops when the algorithm achieves an

acceptable level of performance. Now, let's move on and see some of the popular supervised

learning algorithm. So we have linear

regression random forest and support Vector machines. These are just

for your information. We will discuss

about these algorithms in our next video. Now, let's see some

of the popular use cases of supervised learning so we have Donna codon

or any other speech Automation in your mobile

phone trains using your voice and one strain it start working

based on the training. This is an application

of supervised learning suppose. You are telling

OK Google call Sam or you say Hey Siri call

Sam you get an answer to it and action is performed and automatically

a call goes to Sam. So these are just an example of supervised learning next

comes the weather up based on some

of the prior knowledge like when it is sunny

the temperature is high.

Fire when it is cloudy humidity

is higher any kind of that they predict the parameters

for a given time. So this is also an example

of supervised learning as we are feeding the data

to the machine and telling that whenever it is sunny. The temperature should be higher

whenever it is cloudy. The humidity should be higher. So it's an example

of supervised learning. Another example is

biometric attendance where you train the machine

and after couple of inputs of your biometric identity

beat your thumb your iris or yellow or anything once trained Machine gun

validate your future input and can identify you next comes

in the field of banking sector in banking sector supervised learning is used

to predict the credit worthiness of a credit card holder by building a machine

learning model to look for faulty attributes

by providing it with a data on deliquent and non-delinquent customers.

Next comes the healthcare sector

in the healthcare sector. It is used to predict

the patient's readmission rates by building a regression model by providing data on the patients treatment

Administration and readmissions to show variables that best correlate

with readmission. Next comes the retail sector

and Retail sector. It is used to

analyze the product that a customer by together. It does this by building

a supervised model to identify frequent itemsets and Association rule

from the transactional data now, lets learn about

the next category of machine learning the

unsupervised part mathematically unsupervised learning is where you only have Put data X and no

corresponding output variable. The goal for unsupervised

learning is to model the underlying structure or distribution in the data in order to learn

more about the data.

So let me rephrase you

this in simple terms in unsupervised learning

approach the data instances of a training data

set do not have an expected output Associated to them instead unsupervised learning algorithm

detects pattern based on innate characteristics of the input data an example

of machine learning tasks. Ask that applies unsupervised

learning is clustering in this task similar data

instances are grouped together in order to identify clusters

of data in this slide. You can see that initially

we have different varieties of fruits as input. Now these set of fruits as

input X are given to the model. Now, what is the model

is trained using unsupervised learning algorithm. The model will create clusters

on the basis of its training. It will grip the similar fruits

and make their cluster. Let me make things

more clearer to you. Let's take another

example of it. So in this Slide the image

below shows an example of unsupervised learning process

this algorithm processes an unlabeled training data set and based on

the characteristics.

It grips the picture into three different clusters

of data despite the ability of grouping similar

data into clusters. The algorithm is not capable

to add labels to the crow. The algorithm only knows which

data instances are similar, but it cannot identify

the meaning of this group. So, Now you might be wondering

why this category of machine learning is named

as unsupervised learning. So these are called as unsupervised learning because

unlike supervised learning ever. There are no correct answer and there is no teacher

algorithms are left on their own to discover and present the interesting

structure in the data. Let's move on and see some of the popular unsupervised

learning algorithm.

So we have here

k-means apriori algorithm and hierarchical clustering now, let's move on and see

some of the examples of Is learning suppose a friend

invites you to his party and where you meet

totally strangers. Now, you will classify them

using unsupervised learning as you don't have

any prior knowledge about them and this classification

can be done on the basis of gender age group

dressing education qualification or whatever way you

might like now why this learning is different

from supervised learning since you didn't use

any pasta prior knowledge about the people you kept on

classifying them on the go as they kept on coming you

kept on classifying them. Yeah, this category

of people belong to this group this category of people belong

to that group and so on.

Okay, let's see

one more example. Let's suppose you have never

seen a football match before and by chance you watch

a video on the internet. Now, you can easily classify

the players on the basis of different Criterion, like player wearing

the same kind of Jersey are in one class player

wearing different kind of Jersey aren't different class or you can classify

them on the basis of their playing style

like the guys are attacker. So he's in one class. He's a Defender

he's Another class or you can classify them. Whatever Way You

observe the things so this was also an example

of unsupervised learning. Let's move on and see how unsupervised learning

is used in the sectors of banking Healthcare undertale. So starting at banking sector. So in banking sector it

is used to segment customers by behavioral characteristic

by surveying prospects and customers to develop

multiple segments using clustering and

Healthcare sector. It is used to categorize the MRI

data by normal or abnormal. Ages it uses deep learning

techniques to build a model that learns from different

features of images to recognize a different pattern.

Next is the retail sector

and Retail sector. It is used to recommend

the products to customer based on their past purchases. It does this by building

a collaborative filtering model based on the past

purchases by them. I assume you guys now have a proper idea of

what unsupervised learning means if you have any slightest doubt don't hesitate and add your

doubt to the I'm in section. So let's discuss the third and the last type

of machine learning that is reinforcement learning.

So what is

reinforcement learning? Well reinforcement learning is a type of machine

learning algorithm which allows software agents and machine to automatically

determine the ideal Behavior within a specific context

to maximize its performance. The reinforcement learning

is about interaction between two elements the environment and

the learning agent the learning agent leverages

to mechanism namely exploration. And exploitation when

learning agent acts on trial and error basis, it is termed as exploration and when it acts based

on the knowledge gained from the environment, it is referred

to as exploitation. Now this environment rewards

the agent for correct actions, which is reinforcement signal

leveraging the rewards obtain the agent improves its environment

knowledge to select the next action in this image. You can see

that the machine is confused whether it is an apple

or it's not an apple then the Sheena's chain

using reinforcement learning.

If it makes correct decision. It get rewards point for it and in case of wrong it gets

a penalty for that. Once the training is done. Now. The machine can easily identify

which one of them is an apple. Let's see an example here. We can see that we have an agent who has to judge

from the environment to find out which of the two is

a duck the first task he did is to observe

the environment next. We select some action

using some policy. It seems that the machine

has made a wrong decision. Bye. Choosing a bunny as a duck. So the machine will

get penalty for it. For example –

50.4 a wrong answer right now. The machine will

update its policy and this will continue till the machine gets

an optimal policy from the next time machine will

know that bunny is not a duck. Let's see some of the use cases

of reinforcement learning but before that lets see how Pavlo trained his dog

using reinforcement learning or how he applied the reinforcement method

to train his dog. Babu integrated learning in four stages initially

Pavlo gave me to his dog and in response to the meet

the dog started salivating next what he did he created a sound with the bell for this the dog

did not respond anything in the third part it

tried to condition the dog by using the bell and then giving him

the food seeing the food the dog started salivating

eventually a situation came when the dog started salivating

just after hearing the Bell even if the food was not given to him

as the The dog was reinforced that whenever the master

will ring the bell he will get the food now.

Let's move on and see how reinforcement learning

is applied in the field of banking Healthcare

and Retail sector. So starting with

the banking sector in banking sector reinforcement

learning is used to create a next best offer model for a call center

by building a predictive model that learns over time as user accept or reject offer

made by the sales staff fine now in healthcare sector it

is used to allocate the scars. Resources to handle

different type of er cases by building

a Markov decision process that learns treatment strategies

for each type of er case next and the last comes

in retail sector. So let's see how reinforcement learning

is applied to retail sector and Retail sector. It can be used to

reduce excess stock with Dynamic pricing by building

a dynamic pricing model that are just the price based on customer response

to the offers. I hope by now you have attained

some understanding of what is machine learning

and you are ready to move. Move ahead. Welcome to today's topic

of discussion on AI versus machine learning

versus deep learning.

These are the term

which have confused a lot of people and if you

two are one among them, let me resolve it for you. Well artificial intelligence

is a broader umbrella under which machine learning and deep learning come you

can also see in the diagram that even deep learning is a subset of machine

learning so you can say that all three of them The AI

and machine learning and deep learning are just

the subset of each other. So let's move on and understand how exactly the differ

from each other. So let's start

with artificial intelligence. The term artificial intelligence was first coined

in the year 1956. The concept is pretty old, but it has gained

its popularity recently. But why well, the reason is earlier we had

very small amount of data the data we had was not enough

to predict the Turret result but now there's

a tremendous increase in the amount of data statistics suggest that by 2020

the accumulated volume of data will increase from 4.4 zettabyte stew

roughly around 44 zettabytes or 44 trillion jeebies of data along with such

enormous amount of data.

Now, we have more

advanced algorithm and high-end computing

power and storage that can deal with such large

amount of data as a result. It is expected that 70% of The price

will Implement a i over the next 12 months which is up from 40 percent

in 2016 and 51 percent in 2017. Just for your understanding. What does AI well, it's nothing but a technique that enables the machine

to act like humans by replicating the behavior

and nature with AI it is possible for machine to learn

from the experience. The machines are just

their responses based on new input there by performing human-like tasks

artificial intelligence can be and to accomplish specific tasks by processing

large amount of data and recognizing pattern in them. You can consider that building an artificial

intelligence is like Building a Church the first church

took generations to finish. So most of the workers

were working in it never saw the final outcome those working on it took pride

in their craft building bricks and chiseling stone that was going to be placed

into the great structure.

So as AI researchers, we should think of ourselves

as humble brick makers was job. It's just study how to build components

example Parts is planners or learning algorithm

or Etc anything that someday someone

and somewhere will integrate into the intelligent systems

some of the examples of artificial intelligence

from our day-to-day life are Apple series chess-playing

computer Tesla self-driving car and many more these examples

are based on deep learning and natural language processing. Well, this was about what is AI

and how it gains its hype. So moving on ahead. Let's Gus about machine

learning and see what it is and why it was the

when introduced well Machine learning came

into existence in the late 80s and the early 90s, but what were the issues

with the people which made the machine learning

come into existence let us discuss them one by one

in the field of Statistics.

The problem was how to efficiently train

large complex model in the field of computer science

and artificial intelligence. The problem was

how to train more robust version of AI system while in

the case of Neuroscience. Problem faced by

the researchers was how to design operation

model of the brain. So these were some of the issues which had the largest influence

and led to the existence of the machine learning. Now this machine learning

shifted its focus from the symbolic approaches. It had inherited

from the AI and move towards the methods and model.

It had borrowed from statistics

and probability Theory. So let's proceed and see what exactly is

machine learning. Well Machine learning

is a subset of AI which enables the

computer to act and make data-driven decisions

to carry out a certain task. These programs are algorithms

are designed in a way that they can learn

and improve over time when exposed to new data. Let's see an example

of machine learning. Let's say you want

to create a system which tells the expected weight

of a person based on its side. The first thing you do

is you collect the data. Let's see there is how your data looks

like now each point on the graph represent

one data point to start with we can draw a simple line

to predict the weight based on the height for Sample

a simple line W equal x minus hundred with W

is waiting kgs and edges hide and centimeter this line can

help us to make the prediction.

Our main goal is

to reduce the difference between the estimated value

and the actual value. So in order to achieve it, we try to draw a straight line

that fits through all these different points

and minimize the error. So our main goal is

to minimize the error and make them as small as

possible decreasing the error or the difference between

the actual value and estimated. Value increases the performance

of the model further on the more data points. We collect the better. Our model will become we

can also improve our model by adding more variables and creating different

production lines for them. Once the line is created. So from the next time

if we feed a new data, for example height

of a person to the model, it would easily predict the data

for you and it will tell you what has predicted

weight could be. I hope you got

a clear understanding of machine learning. So moving on ahead. Let's learn about deep learning

now what is deep learning? You can consider deep learning

model as a rocket engine and its fuel is

its huge amount of data that we feed to

these algorithms the concept of deep learning is not new, but recently it's

hype as increase and deep learning

is getting more attention.

This field is a particular kind

of machine learning that is inspired by the functionality of

our brain cells called neurons which led to the concept

of artificial neural network. It simply takes the data connection between all

the artificial neurons and adjust them according

to the data pattern. More neurons are added at the size of the data is large

it automatically features learning at multiple

levels of abstraction.

Thereby allowing a system to learn complex function

mapping without depending on any specific algorithm. You know, what no one

actually knows what happens inside a neural network

and why it works so well, so currently you can call

it as a black box. Let us discuss some

of the example of deep learning and understand it

in a better way. Let me start with

in simple example and explain you how things And

at a conceptual level, let us try and understand how you would recognize

a square from other shapes. The first thing

you do is you check whether there are four lines

associated with a figure or not simple concept, right? If yes, we further check if they are connected

and closed again a few years.

We finally check

whether it is perpendicular and all its sides

are equal, correct. If everything fulfills. Yes, it is a square. Well, it is nothing but

a nested hierarchy of Concepts. What we did here we

took a complex task of identifying a square and this case and broken

into simpler tasks. Now this deep learning

also does the same thing but at a larger scale, let's take an example

of machine which recognizes the animal the task

of the machine is to recognize whether the given image is

of a cat or a dog.

What if we were asked to resolve

the same issue using the concept of machine learning

what we would do first. We would Define

the features such as check whether the animal has

whiskers or not a check. The animal has pointed ears or not or whether its tail

is straight or curved in short. We will Define

the facial features and let the system identify which

features are more important in classifying a

particular animal now when it comes to deep learning

it takes this to one step ahead deep learning automatically

finds are the feature which are most important

for classification compare into machine learning where we had to manually give

out that features by now. I guess you have understood

that AI is the bigger picture and machine learning and

deep learning are it's apart. So let's move on and focus our discussion

on machine learning and deep learning the easiest

way to understand the difference between the machine learning

and deep learning is to know that deep learning is machine

learning more specifically.

It is the next evolution

of machine learning. Let's take few

important parameter and compare machine learning

with deep learning. So starting with

data dependencies, the most important difference

between deep learning and machine learning is

its performance as the volume of the data gets

From the below graph. You can see that when the size of the data

is small deep learning algorithm doesn't perform that well, but why well, this is because deep

learning algorithm needs a large amount of data

to understand it perfectly on the other hand the machine

learning algorithm can easily work with smaller data set fine. Next comes the hardware

dependencies deep learning algorithms are heavily dependent

on high-end machines while the machine learning

algorithm can work on low and machines as Well, this is because the requirement of deep learning

algorithm include gpus which is an integral part of its working the Deep learning

algorithm requires gpus as they do a large amount of matrix

multiplication operations, and these operations can only be efficiently

optimized using a GPU as it is built for this purpose.

Only our third parameter will be feature engineering well

feature engineering is a process of putting the domain knowledge

to reduce the complexity of the data. Make patterns more visible

to learning algorithms. This process is difficult

and expensive in terms of time and expertise in case

of machine learning most other features are needed

to be identified by an expert and then hand coded

as per the domain and the data type. For example, the features can be a pixel value shapes

texture position orientation or anything fine the performance of most of the machine

learning algorithm depends on how accurately the features

are identified and stood where as in case of deep learning algorithms it

try to learn high level features from the data.

This is a very distinctive part

of deep learning which makes it way ahead of traditional machine learning

deep learning reduces the task of developing new feature

extractor for every problem like in the case of CNN algorithm it first try

to learn the low-level features of the image such as

edges and lines and then it proceeds

to the parts of faces of people and then finally to

the high-level representation of the face. I hope that things

Getting clearer to you. So let's move on ahead and see

the next parameter. So our next parameter is

problem solving approach when we are solving a problem

using traditional machine learning algorithm. It is generally recommended that we first break

down the problem into different sub parts

solve them individually and then finally combine them

to get the desired result. This is how the machine learning

algorithm handles the problem on the other hand

the Deep learning algorithm solves the problem

from end to end. Let's take an example. To understand this

suppose you have a task of multiple object detection. And your task is to identify. What is the object and where it

is present in the image.

So, let's see and compare. How will you tackle

this issue using the concept of machine learning and deep learning starting

with machine learning in a typical machine

learning approach. You would first divide

the problem into two step first object detection

and then object recognization. First of all, you would use a bounding

box detection algorithm like grab could fight. Sample to scan through the image and find out all

the possible objects. Now, once the objects

are recognized you would use object recognization algorithm, like svm with hog

to recognize relevant objects. Now, finally, when you combine the result you

would be able to identify.

What is the object

and where it is present in the image on the other hand

in deep learning approach. You would do the process

from end to end for example in a yellow net which is a type of deep learning

algorithm you would pass. An image and it would give out

the location along with the name of the object. Now, let's move on to

our fifth comparison parameter its execution time. Usually a deep learning

algorithm takes a long time to train this is because there's so many parameter in

a deep learning algorithm that makes the training longer than usual the training

might even last for two weeks or more than that. If you are training

completely from the scratch, whereas in the case

of machine learning, it relatively takes much

less time to train ranging from a few weeks. Too few Arts. Now. The execution time

is completely reversed when it comes to the testing

of data during testing the Deep learning algorithm

takes much less time to run.

Whereas if you compare it

with a KNN algorithm, which is a type of machine

learning algorithm the test time increases as the size

of the data increase last but not the least we

have interpretability as a factor for comparison

of machine learning and deep learning. This fact is the main reason

why deep learning is still thought ten times

before anyone knew. Uses it in the industry. Let's take an example suppose. We use deep learning to give automated scoring two essays

the performance it gives and scoring is quite

excellent and is near to the human performance, but there's an issue with it. It does not reveal white has given that score

indeed mathematically. It is possible to find out that which node of a deep

neural network were activated, but we don't know what the neurons

are supposed to model and what these layers of neurons

are doing collectively.

So if To interpret the result on the other hand machine

learning algorithm, like decision tree gives us

a crisp rule for void chose and watered chose. So it is particularly easy

to interpret the reasoning behind therefore the algorithms

like decision tree and linear or logistic regression are primarily used in

industry for interpretability. Let me summarize things for you machine learning

uses algorithm to parse the data learn from the data and make informed decision based

on what it has learned fine. in this deep learning structures

algorithms in layers to create artificial neural network that can learn and make Intelligent Decisions

on their own finally deep learning is a subfield

of machine learning while both fall

under the broad category of artificial intelligence

deep learning is usually what's behind the most human-like artificial

intelligence now in early days scientists

used to have a lab notebook to Test progress results and conclusions now

Jupiter is a modern-day to that allows data scientists to record the complete

analysis process much in the same way other scientists

use a lab notebook.

Now, the Jupiter product was

originally developed as a part of IPython project the iPad and project was used to provide

interactive online access to python over time. It became useful to interact with other data analysis tools

such as are in the same manner with the split from python the tool crew in in his current

manifestation of Jupiter. Now IPython is

still an active tool that's available for use. The name Jupiter itself is

derived from the combination of Julia Python. And our while Jupiter runs code in many programming languages

python is a requirement for installing the jupyter

notebook itself now to download jupyter notebook.

There are a few ways

in their official website. It is strongly recommended

installing Python and Jupiter using Anaconda distribution, which includes python

Don't know what book and other commonly used packages for scientific Computing

as well as data science. Although one can also do so using the pipe

installation method personally. What I would suggest

is downloading an app on a navigator, which is a desktop graphical user

interface included in Anaconda. Now, this allows you

to launch application and easily manage

conda packages environments and channels without the need

to use command line commands.

So all you need to do is go

to another Corner dot orgy and inside you go. To Anaconda Navigators. So as you can see here, we have the conda installation

code which you're going to use to install it

in your particular PC. So either you can

use these installers. So once you download

the Anaconda Navigator, it looks something like this. So as you can see here, we have Jupiter lab jupyter

notebook you have QT console, which is IPython console. We have spider which is

somewhat similar to a studio in terms of python again, we have a studio

so we have orange three We have glue is

and we have VSC code. Our Focus today would be

on this jupyter notebook itself. Now when you

launch the Navigator, you can see there are

many options available for launching python as well. As our instances Now

by definition are jupyter. Notebook is fundamentally a Json file with

a number of annotations.

Now, it has three main parts which are the metadata

The Notebook format and the list of cells now you should get yourself acquainted

with the environment that Jupiter user interface

has a number of components. So it's important to know what our components

you should be using on a daily basis and you

should get acquainted with it. So as you can see here our Focus today will be

on the jupyter notebook. So let me just launched

the Japan and notebook. Now what it does is creates

a online python instance for you to use it over the web. So let's launch now as you can see we have

Jupiter on the top left as expected and this acts

as a button to go to your home page

whenever you click on this you get back

to your particular home paste. Is the dashboard now there are

three tabs displayed with other files

running and clusters. Now, what will do is

will understand all of these three and understand what are the importance of these three tabs

other file tab shows the list of the current files

in the directory.

So as you can see we have

so many files here. Now the running tab presents another screen of

the currently running processes and the notebooks now the

drop-down list for the terminals and notebooks are

populated with there. Running numbers. So as you can see inside, we do not have

any running terminals or there no running

notebooks as of now and the cluster tab presents another screen

to display the list of clusters available see in the

top right corner of the screen. There are three buttons

which are upload new and the refresh button. Let me go back

so you can see here. We have the upload new

and the refresh button.

Now the upload button

is used to add files to The Notebook space and you

may also just drag and drop as you would

when handling files. Similarly, you can drag and drop notebooks

into specific folders as well. Now the menu with the new

in the top residents of further many

of text file folders terminal and Python 3. Now, the test file option

is used to add a text file to the current directory Jupiter

will open a new browser window for you for the running

new text editor. Now, the text entered

is automatically saved and will be displayed

in your notebooks files display. Now the folder option what it does is

creates a new folder. With the name Untitled folder

and remember all the files and folder names are editable. Now the terminal option

allows you to start and IPython session. The node would options

available will be activated when additional note books are

available in your environment. The Python 3 option is used

to begin pythons recession interactively in your note. The interface looks

like the following screen shot.

Now what you have is

full file editing capabilities for your script

including saving as new file. You also have a complete ID for your python script now we

come to the refresh button. The refresh button is used

to update the display. It's not really necessary

as a display is reactive to any changes in

the underlying file structure. I had a talk with

the files tab item. There is a check box drop down menu and a home button

as you can see here. We have the checkbox

the drop-down menu and the home button. Now the check box is used

to toggle all the checkboxes in the item list. So as you can see you can select

all of these when either move or either delete all

of the file selected, It or what you

can do is select all and deselect some of the files as your wish now the drop

down menu presents a list of choices available, which are the folders

all notebooks running and files to the folder section will select all the folders

in the display and present account

of the folders in the small box.

So as you can see here, we have 18 number of folders

now all the notebooks section will change the count

of the number of nodes and provide you

with three option so you can see here. It has selected all

the given notebooks which are a In a number and you get the option to either

duplicate the current notebook. You need to move it view

it edit it or delete. Now, the writing section

will select any running scripts as you can see here. We have zero running scripts and update the count

to the number selected. Now the file section

will select all the files in the notebook display and

update the counts accordingly. So if you select the files here, we are seven files

as you can see here. We have seven files

some datasets CSV files and text files now

the home button. Brings you back to the home

screen of the notebook.

So on you to do is click

on the jupyter. Notebook lower. It will bring you back to

the Jupiter notebook dashboard. Now, as you can see

on the left hand side of every item is a checkbox

and I can and the items name. The checkbox is used to build

a set of files to operate upon and the icon is indicated

of of the type of the item. And in this case, all of the items are

folder here coming down.

We have the ring notebooks. And finally we have certain

files which are the text files and the As we files

now a typical workflow of any jupyter. Notebook is to first

of all create a notebook for the project

or your data analysis. Add your analysis step coding and output and Surround

your analysis with organization and presentation mark

down to communicate and entire story

now interactive notebooks that include widgets

and display modules will then be used by others

by modifying parameters and the data to note the effects

of the changes now if we talk about security

jupyter notebooks are created in order to Be shared

with other users in many cases over the Internet. However, jupyter notebook

can execute arbitrary code and generate arbitrary code. This can be a problem. If malicious aspects

have been placed in the note Now the default

security mechanism for Japan or notebooks include raw HTML, which is always sanitized

and check for malicious coding. Another aspect is you cannot run

external Java scripts. Now the cell contents, especially the HTML and

the JavaScript are not trusted it requires user value. Nation to continue and the output from any cell

is not trusted all other HTML or JavaScript is never trusted and clearing the output

will cause the notebook to become trusted when save now notebooks

can also use a security digest to ensure the correct user

is modifying the contents.

So for that what you

need to do is a digest what it does is takes

into the account the entire contents

of the notebook and a secret which is only known by

The Notebook Creator and this combination ensures that malicious coding is

is not going to be added to the notebook so you can add security

to address to notebook using the following command

which I have given here. So it's Jupiter the profile what you have selected

and inside you what you need to do is security

and notebook secret. So what you can do is

replace the notebooks secret with your putter secret and that will act as a key

for the particular notebook. So what you need to do

is share that particular key with all your colleagues or whoever you want to share

that particular notebook with and in that case, it keeps the notebooks.

Geode and away from

other malicious coders and all other aspect

of Jupiter is configuration. So you can configure some

of the display parameters used and presenting notebooks. Now, these aren't configurable

due to the use of product known as code mirror to present

and modify the notebook. So cold mirror water basically

is it is a JavaScript based editor for the u.s. Within the web pages

and notebooks. So what you do is

what you do code mirror, so as you can see here code mirror is a versatile

text editor implemented. In JavaScript for the browser. So what it does is allow you to configure

the options for Jupiter. So now let's execute

some python code and understand the notebook

in A Better Way Jupiter does not interact with your scripts as

much as it executes your script and request the result.

So I think this is

how jupyter notebooks have been extended to other

languages besides python as it just takes a script runs it against

a particular language engine and across the output

from the engine all the while not Really

knowing what kind of a script is being executed

now the new windows shows and empty cell for you to enter

the python code know what you need to do is under new

you select the Python 3 and what I will do is open

a new notebook.

Now this notebook is Untitled. So let's give the new work area

and name python code. So as you can see we have

renamed this particular cell now order save option should be

on the next to the title as you can see last. Checkpoint a few days

ago, unsaved changes. The autosave option is

always on what we do is with an accurate name. We can find the selection and this particular

notebook very easily from The Notebook home page. So if you select

your browser's Home tab and refresh you will find this new window name

displayed here again. So if you just go

to a notebook home and as you can see, I mentioned it by then quotes

and under running. Also, you have the pilot

and quotes here. So let's get back

to the Particular page or the notebook

one thing to note here that it has and does an item icon

versus a folder icon though automatically

assigned extension as you can see here is ipy

and be the IPython note and says the item is in a browser

in a Jupiter environment.

It is marked as running answer is a file by that name

in this directory as well. So if you go

to your directory, let me go and check it. So as you can see

if you go into the users are you can see we have the

in class projects that Python codes

like the series automatically have that particular

IPython notebook created in our working environment and the local disk space also.

So if you open the IP y +

B file in a text editor, you will see basic context

of a Jupiter code as you can see if I'm opening it. The cells are empty. Nothing is there so let's type

in some code here. For example, I'm going to put

in name equals edgy Rekha. Next what I'm going to do

is provide subscribers that equals seven hundred gay

and to run this particular cell.

What you need to do

is click on the run Icon and it will see

here we have one. So this is the first set to be

executed in the second cell. We enter python code that references the variables

from the first cell. So as you can see here, we have friend named

has strings subscribers. So let me just

run this particular. So as you can see here note. Now that we have an output here that Erica has 700k

YouTube subscriber now since more than 700 K now

to know more about Jupiter and other Technologies, what you can do is subscribe

to our Channel and get updates on the latest

trending Technologies. So note that Jupiter color codes your python just as

decent editor vote and we have empty braces to the left of each code block

such as you can see here. If we execute the cell the results are displayed

in line now, it's interesting that Jupiter keeps. The output last generated

in the saved version of the file and it's a save checkpoints. Now, if we were to rerun

your cells using the rerun or the run all the output

would be generated and c8y autosave now, the cell number is incremented

and as you can see if I rerun this you see

the cell number change from one to three and if I rerun this the Selma

will change from 2 to 4.

So what Jupiter does is keeps

a track of the latest version of each cell so similarly if you are to close

the browser tab It's the display in the Home tab. You will find

a new item we created which is the python code

your notebook saved autosaved as you can see here

in the bracket has autosaved. So if we close this

in the home button, you can see here.

We have python codes. So as you can see if we click

that it opens the same notebook. It has the previously displayed items will be always

there showing the output sweat that we generated

in the last run now that we have seen

how python Works in Jupiter including

the underlying encoding then how this python. This allows data set

or data set Works in Jupiter. So let me create

another new python notebook. So what I'm going to do

is name this as pandas. So from here, what we will do is read

in last dataset and compute some standard

statistics of data. Now what we are interested

in in seeing how to use the pandas in Jupiter how well the script performs and what information

is stored in the metadata, especially if it's

a large dataset so our Python script accesses

the iris dataset here that's built into one

of the Python packages. Now. All we are looking in to do is

to read in slightly large number of items and calculate

some basic operations on the data set. So first of all, what we need to do is from sklearn import

the data set so sklearn is scikit-learn and it is

another library of python.

It contains a lot of data sets

for machine learning and all the algorithms which are present

for machine learning and the data sets

which are there so, So import was successful. So what we're going to do

now is pull in the IRS data. What we're going to do is Iris

underscore data set equals and the load on the screen now

that should do and I'm sorry, it's data set start lower. So so as you can see here, the number here

is considered three now because in the second drawer and we encountered

an error it was data set. He's not data set. So so what we're going to do is grab the first

two corner of the data. So let's pretend x equals. If you press the tab,

it automatically detects what you're going to write

as Todd datasets dot data. And what we're going to do is

take the first two rows comma not to run it

from your keyboard.

All you need to do is

press shift + enter. So next what we're going

to do is calculate some basic statistics. So what we're going

to do is X underscore. Count equals x I'm going to use

the length function and said that we're going to use x

dot flat similarly. We going to see X-Men and X Max and the Min

our display our results. What we're going to do is you

just play the results now, so as you can see the counter 300

the minimum value is 3.8 m/s.

And what is 0.4 and the mean is five point

eight four three three three. So let me connect you

to the real life and tell you what

all are the things which you can easily do using

the concepts of machine learning so you can easily get

answer to the questions like which types of house

lies in this segment or what is the market value

of this house or is this a male as spam or not spam? Is there any fraud? Well, these are some

of the question you could ask to the machine but for getting an answer

to these you need some algorithm the machine need to train

on the basis of some algorithm. Okay, but how will you

decide which algorithm to choose and when? Okay. So the best option for us is

to explore them one by one. So the first is

classification algorithm where the categories

predicted using the data if you have some question, like is this person a male or a female or is

this male a Spam or not? Spam then these category

of question would fall under the classification

algorithm classification is a supervised learning approach in which the computer program

learns from the input given to it and then uses this learning to classify

new observation some examples of classification problems are speech organization

handwriting recognized.

Shouldn't biometric

identification document classification Etc. So next is the anomaly

detection algorithm where you identify

the unusual data point. So what is an anomaly detection. Well, it's a technique that is used to

identify unusual pattern that does not conform

to expected Behavior or you can say the outliers. It has many application in business like

intrusion detection, like identifying strange

patterns in the network traffic that could signal a hack

or system Health monitoring that is sporting a deadly tumor

in the MRI scan or you can even use it for fraud detection

credit card transaction or to deal with fault detection

in operating environment. So next comes

the clustering algorithm, you can use this clustering

algorithm to group the data based on some similar condition. Now you can get answer

to which type of houses lies in this segment or what type

of customer buys this product. The clustering is a task

of dividing the population or data points into

number of groups such that the data point

and the same groups are more.

Hello to other data points

in the same group than those in the other groups

in simple words. The aim is to segregate

groups with similar trait and assigning them into cluster. Now this clustering is a task

of dividing the population or data points into

a number of groups such that the data points in the X group is more similar

to the other data points in the same group rather than

those in the other group. In other words. The aim is to segregate

the groups with similar traits and assigning them

into different clusters. Let's understand this with an example Suppose you are

the head of a rental store and you wish to understand

the preference of your customer to scale up your business. So is it possible for you to look at the detail

of each customer and design a unique business strategy

for each of them? Definitely not right? But what you can do is to

Cluster all your customer saying to 10 different groups based

on their purchasing habit and you can use

a separate strategy for customers in each

of these ten different groups.

And this is

what we call clustering. Next we have regression

algorithm where the data itself is predicted question. You may ask to this type

of model is like what is the market value of this house or is it going to rain

tomorrow or not? So regression is one of the most

important and broadly used machine learning

and statistics tool. It allows you to make prediction

from data by learning the relationship between

the features of your data and some observe continuous

valued response regulation is used in a massive

number of application. You know, what stock Isis

prediction can be done using regression now, you know about different

machine learning algorithm. How will you decide

which algorithm to choose and when so let's cover

this part using a demo. So in this demo part what we will do will create six

different machine learning model and pick the best model

and build the confidence such that it has the most

reliable accuracy. So far our demo part

will be using the IRS data set.

This data set is

quite very famous and is considered one of

the best small project to start with you can consider this as a hello world data set

for machine learning. So this data set consists of 150 observation

of Iris flower. Therefore Columns of measurement

of flowers in centimeters the fifth column

being the species of the flower observe all

the observed flowers belong to one of the three species

of Iris setosa Iris virginica and Iris versicolor. Well, this is

a good good project because it is so

well to understand the attributes are numeric. So you have to figure out

how to load and handle the data. It is a classification problem. Thereby allowing you to practice with perhaps an easier type of

supervised learning algorithm. It has only four

attributes and 150 rose.

Meaning it is very small and can easily fit

into the memory and even all of the numeric attributes

are in same unit and the same scale means you do

not require any special scaling or transformation

to get started. So let's start coding and

as I told earlier for the But I'll be using Anaconda

with python 3.0 install on it. So when you install Anaconda how your Navigator

would look like. So there's my home page of

my anaconda navigator on this. I'll be using

the jupyter notebook, which is a web-based interactive

Computing notebook environment, which will help me to write and

execute my python codes on it. So let's hit the launch

button and execute our jupyter notebook. So as you can see that my jupyter notebook

is starting on localhost double eight nine zero. Okay, so there's

my jupyter notebook what I'll do here. I'll select new. book Python 3 Does

my environment where I can write and execute all

my python codes on it? So let's start

by checking the version of the libraries in order

to make this video short and more interactive

and more informative.

I've already written

the set of code. So let me just copy

and paste it down. I'll explain you

then one by one. So let's start by checking the version

of the Python libraries. Okay, so there is

the code let's just copy it copied and let's paste it. Okay first let me summarize things for you

what we are doing here.

We are just checking the version of the different

libraries starting with python will first

check what version of python we are working

on then we'll check what are the version

of sci-fi we are using the numpy matplotlib then

Panda then scikit-learn. Okay. So let's execute

the Run button and see what are the various

versions of libraries which we are using it the run. So we are working on Python 3

point 6 point 4 PSI by 1.0 now. By 1.1 for matplotlib 2.12

pandas 0.22 and scikit-learn or version 0.19. Okay. So these are the version which I'm using ideally your

version should be more recent or it should match

but don't worry if you lack

a few versions behind as the API is do not change

so quickly everything in this tutorial will very

likely still work for you. Okay, but in case you are getting an error stop

and try to fix that error in case you are unable to find

the solution for the error, feel free to reach out at Eureka

even after the This class.

Let me tell you this if you are not able to run

the script properly, you will not be able

to complete this tutorial. Okay, so whenever you

get a doubt reach out to a deal-breaker

and just resolve it now, everything is working

smoothly then now is the time to load the data set. So as I said, I'll be using the iris flower

data set for this tutorial but before loading the data set, let's import all the modules

function and the object which we are going to use

in this tutorial same I've already written

the set of code. So let's just copy

and paste them. Let's load all the libraries. So these are

the various libraries which will be using

in our tutorial. So everything should work

fine without an error. If you get an error just

stop you need to work on your cyber environment

before you continue any further. So I guess everything

should work fine.

Let's hit the Run

button and see. Okay, it worked. So let's now move ahead

and load the data. We can load the data direct from the UCI machine

learning repository. First of all, let me tell you we are using

Panda to load the data. Okay. So let's say my URL. Is this so This is My URL for the use

your machine learning repository from where I will be

downloading the data set.

Okay. Now what I'll do, I'll specify the name

of each column when loading the data. This will help me later

to explore the data. Okay, so I'll just copy

and paste it down. Okay, so I'm defining

a variable names which consists of

various parameters including sepal length sepal

width petal length battle with and class. So these are just the name

of column from the data set. Okay. Now let's define the data set. So data set equals Panda

dot read underscore CSV inside that we are defining

URL and the names that is equal to name. As I already said we'll be using

Panda to load the data. Alright, so we are using

Panda dot read CSV, so we are reading. The CSV file and inside that from where that CSV is coming

from the URL which you are. So there's my URL. Okay name sequel names. It's just specifying the names

of the various columns in that particular CSV file. Okay. So let's move forward

and execute it.

So even our data set is loaded. In case you have some network

issues just go ahead and download the iris data file

into your working directory and loaded using the same method

but your make sure that you change the url

to the local name or else you might get an error. Okay. Yeah, our data set is loaded. So let's move ahead

and check out data set. Let's see how many columns

or rows we have in our data set. Okay. So let's print the number of rows and columns

in our data set. So our data set is

data set dot shape what this will do. It will just give you

the numbers of total number of rows and 2. Little more of column

or you can say the total number of instances are attributes

in your data set fine. So print data set dot shape

audio getting 150 and 500. So 150 is the total number

of rows in your data set and five is the total number

of columns fine. So moving on ahead. What if I want to see

the sample data set? Okay.

So let me just print

the first certain instances of the data set. Okay, so print data set. Head. What I want is the first

30 instances fine. This will give me the first

30 result of my data set. Okay. So when I hit the Run button what I am getting is

the first 30 result, okay 0 to 29. So this is how my sample data set looks

like sepal length sepal width petal and petal width

and the class, okay. So this is how our data

set looks like now, let's move on and look at the summary

of each attribute. What if I want to find out

the count mean the minimum and the maximum values and

some other percentiles as well.

So what should I do then for that print data

set dot described. What did we give let's see. So you can see that all the numbers are

the same scales of similar range between 0 to 8 centimeters, right the mean value

the standard deviation the minimum value

the 25 percentile 50 percentile 75 percentile the maximum value all

these values lies in the range between

0 to 8 centimeter. Okay. So what we just did is

we just took a summary of each attribute. Now, let's look

at the number of instances that belong to each class. So for that what we'll do

print data set. First of all, so let's print data set and I want to group it

Group by using class and I want the size

of it size of each class fine, and let's hit the Run. Okay. So what I want to do, I want to print

print out data set. However want to get it. I want it by class. So Group by class. Okay. Now I want the size of each class find

the size of each class.

So Group by class dot size

and skewed the run so you can see that I have 50 instances

of Iris setosa 50 instances of Iris versicolor and 50 instances

of Iris virginica. Okay, all our of data type

integer of base64 fine. So now we have a basic

idea of Data, now, let's move ahead and create

some visualization for it. So for this we are going

to create two different types of plot first would be

the univariate plot and the next would be

the multivariate plot. So we'll be creating univariate

plots to better understand about each attribute and the next will be creating

the multivariate plot to better understand the relationship

between different attributes. Okay. So we start with

some univariate plot that is plot

of each individual variable. So given that the input

variables are numeric we can create box

and whiskers plot for it. Okay. So let's move ahead and create

a box and whiskers plot so data set Dot Plot. What kind I want it's a box. Okay, I'm do I need a subplot? Yeah, I need subplots for that.

So subplots equal to what type of layout do I won't so

my layout structure is 2 cross 2 next do I want

to share my coordinates X and Y coordinates. No, I don't want to share it. So share x equal false and even share why

that 2 equals false? Okay. So we have our data set

Dot Plot kind equal box. My subplots is to lay out

to Us too and then what I want to do it, I want to see so Plot show

whatever I created short. Okay, execute it. Not just gives us

a much clearer idea about the distribution

of the input attribute.

Now what if I had given

the layout to 2 cross 2 instead of that I would have given

it for cross for so what it will result

just see fine. Everything would be printed

in just one single row. Hold on guys area is a doubt. He's asking that why

we're using the sheriff's and share y values. What are these why we have

assigned false values to it? Okay Ariel. So in order to

resolve this query, I need to show you

what will happen if I give True Values to them. Okay, so be with me

so share its go. Pull through and share why

that equals true. So let's see

what result will get.

You're getting it the X and y-coordinates are just

shared among all the for visualization. Right? So are you can see that the sepal length

and sepal width has y values ranging from zero point

zero two seven point five which are being shared among both the visualization so

is with the petal length. It has shared value between zero point

zero two seven point five. Okay, so that is why

I don't want to share the value of X and Y, so it's just giving us

a cluttered visualization. So Aria why I'm doing this. I'm just doing it cause I don't want my X

and Y coordinates To be shared among any visualization. Okay. That is why my share X and share

by value are false. Okay, let's execute it. So this is a pretty

much Clear visualization which gives a clear idea

about the distribution of the input attribute.

Now if you want you

can also create a histogram of each input variable

to get a clear idea of the distribution. So let's create

a histogram for it. So data set dot his okay. I would need to see it. So plot dot show. Let's see. So there's my histogram

and it seems that we have two input variables

that have a go. And distribution so

this is useful to note as we can use the algorithms that can exploit

this assumption. Okay. So next comes

the multivariate lat now that we have created the

univariate plot to understand about each attribute. Let's move on and look

at the multivariate plot and see the interaction between

the different variables. So first, let's look

at the scatter plot of all the attribute

this can be helpful to spot structured relationship

between input variables. Okay. So let's create

a scatter Matrix. So for creating a scatter plot,

we need scatter Matrix, and we need to pass

our data set into It okay.

And then what I want

I want to see it. So plot dot show. So this is

how my scatter Matrix looks like it's like that the diagonal grouping

of some pear, right? So this suggests

a high correlation and a predictable relationship. All right. This was our multivariate plot. Now, let's move on

and evaluate some algorithm that's time to create

some model of the data and estimate the accuracy

on the basis of unseen data.

Okay. So now we know all

about our data set, right? We know how many instances and attributes are

there in our data set. We know the summary

of each attribute. So I guess we have seen much

about our data set. Now. Let's move on

and create some algorithm and estimate their accuracy

based on the Unseen data. Okay. Now what we'll do we'll create

some model of the data and estimate the accuracy based

on the some unseen data. Okay. So for that first of all, let's create a

validation data set. What is the validation data

set validation data set is your training data set that will be using it

to trainer model fine.

All right. So how will create

a validation data set for creating

a validation data set? What we are going to do is we

are going to split our data set into two point. Okay. So the very first thing

we'll do is to create a validation data set. So why do we even need

a validation data set? So we need a validation

data set know that the model we

created is any good later. What we'll do we'll use

the statistical method to estimate the accuracy

of the model that we create on the Unseen data. We also want a more concrete

estimate of the accuracy of the best model on unseen data by evaluating it

on the actual unseen data. Okay confused. Let me simplify this for you. What we'll do we'll

split the loaded data into two parts the first

80 percent of the data.

User to train our model

and the rest 20% will hold back as the validation data set that will use it to verify

our trained model. Okay fine. So let's define an array. This is my ra water it

will consist of will consist of all the values

from the data set. So data set dot values. Okay next. I'll Define a variable X which will consist

of all the column from the array from 0

to 4 starting from 0 to 4 and the next variable Y which would consist of

of the array starting from this. So first of all, we will Define a variable X

that will consist of the values in the array starting

from the beginning 0 Del for okay.

So these are the column

which will include in the X variable and for a y variable I'll Define

it as a class or the output. So what I need, I just need the fourth column

that is my class column. So I'll start it

from the beginning and I just want

the fourth column. Okay now I'll Define

the my validation size. Validation underscore sighs, I'll Define it as 0.20

and our use a seed I Define CD equals 6. So this method seed sets

the integers starting value used in generating random number. Okay, I'll Define the value

of C R equals x. I'll tell you what is

the importance of that later on? Okay.

So let me Define first

few variables such as X underscore train test

why underscore train and why underscore test Okay, so What do you want to do

is Select some model. Okay, so module

underscore selection. But before doing that what we have to do

is split our training data set into two halves. Okay, so dot train underscore

test underscore split what you want to split

is a value of X and Y. Okay and my test size is equals

to validation size, which is a 0.20 correct

and my random state. Is equal to seed

so what the city is doing? It's helping me to keep the same

Randomness in the training and testing data set fine.

So let's execute it and see

what is our result. It's executed next. We'll create a test

harness for this. We'll use 10-fold

cross-validation to estimate the accuracy. So what it will do it

will split a data set into 10 parts crane

on the nine part and test on the one part and this will repeat

for all combination of train and test pilots. Okay. So for that, let's define again my CD

that was six already Define and scoring

equals accuracy fine. So we are using the metric of

accuracy to evaluate the model.

So what is this? This is a ratio of number

of correctly predicted instances divided by the total number

of instances in the data set x hundred giving a

percentage example. It's 98% accurate or 99%

accurate things like that. Okay, so we'll be

In the scoring variable when we run the build and evaluate each model

in the next step. The next part is building

model till now. We don't know which algorithm

would be good for this problem or what configuration to use.

So let's begin with

six different algorithm. I'll be using logistic regression linear

discriminant analysis, k-nearest neighbor

classification and regression trees neighbor buys. And what Vector machine well these algorithms chime using is

a good mixture of simple linear or non-linear algorithms

in simple linear switch. Included the logistic regression and the linear discriminant

analysis or the nonlinear part which included the KNN

algorithm the card algorithm that the neighbor buys

and the support Vector machines.

Okay. So we reset

the random number seed before each run

to ensure that evaluation of each algorithm is performed using exactly

the same data spreads. It ensures the result

are directly comparable. Okay, so, let me

just copy and paste it. Okay. So what we're doing here, we are building

five different types of model. We are building

logistic regression linear discriminant analysis, k-nearest neighbor decision

tree ghajini buys and the support Vector machine. Okay next what we'll do we'll

evaluate model in each turn. Okay. So what is this? So we have six different model and accuracy estimation

for each one of them now we need to compare

the model to each other and select the most

accurate of them all. So running the script we

saw the following result so we can see some

of the results on the screen.

What is It is just the accuracy score using

different set of algorithms. Okay, when we are using

logistic regression, what is the accuracy rate when we are using

linear discriminant algorithm? What is the accuracy

and so-and-so? Okay. So from the output with seems that LD algorithm was

the most accurate model that we tested now, we want to get an idea

of the accuracy of the model on our validation set

or the testing data set. So this will give us

an independent final check on the accuracy

of the best model. It is always valuable

to keep our testing data set for just in case you

made a our overfitting to the testing data set or you made a data leak

both will result in an overly optimistic result. Okay, you can run the ldo model

directly on the validation set and summarize the result as

a final score a confusion Matrix and a classification statistics

and probability are essential because these disciples

form the basic Foundation of all machine learning

algorithms deep learning.

Social intelligence

and data science, in fact mathematics

and probability is behind everything around us

from shapes patterns and colors to the count of petals in a flower

mathematics is embedded in each and every

aspect of our lives. So I'm going to go ahead

and discuss the agenda for today with you

all we're going to begin the session by understanding what is data after that. We'll move on and look

at the different categories of data like quantitative

and Qualitative data, then we'll discuss what

exactly statistics is the basic terminologies in

statistics and a couple of sampling techniques. Once we're done with that. We'll discuss a different

types of Statistics which involve descriptive

and inferential statistics. Then in the next session, we will mainly be focusing

on descriptive statistics here will understand

the different measures of center measures

of spread Information Gain and entropy will also understand all of these measures

with the help of a user.

And finally, we'll discuss what

exactly a confusion Matrix is. Once we've covered the entire descriptive

statistics module will discuss the probability module here will understand

what exactly probability is the different

terminologies in probability. We will also study the different

probability distributions, then we'll discuss the types

of probability which include marginal probability joint

and conditional probability. Then we move on and discuss a use case

wherein we will see examples that show us how the different types

of probability work and to better

understand Bayes theorem.

We look at a small example. Also, I forgot to mention that at the end of the

descriptive statistics module will be running a small demo

in the our language. So for those of you

who don't know much about our I'll be explaining

every line in depth, but if you want to have

a more in-depth understanding about our I'll leave

a couple of blocks. And a couple of videos

in the description box you all can definitely

check out that content. Now after we've completed the

probability module will discuss the inferential statistics

module will start this module by understanding what is point

estimation will discuss what is confidence interval

and how you can estimate the confidence interval will

also discuss margin of error and will understand all

of these concepts by looking at a small use case.

We finally end the inferential

Real statistic module by looking at what hypothesis

testing is hypothesis. Testing is a very important part

of inferential statistics. So we'll end the session

by looking at a use case that discusses how

hypothesis testing works and to sum everything up. We'll look at a demo that explains how

inferential statistics works. Right? So guys, there's

a lot to cover today. So let's move ahead and take

a look at our first topic which is what is data. Now, this is

a quite simple question if I ask any of You

what is data? You'll see that it's

a set of numbers or some sort of documents that have stored in my computer

now data is actually everything.

All right, look around you there

is data everywhere each click on your phone generates

more data than you know, now this generated data

provides insights for analysis and helps us make

Better Business decisions. This is why data is

so important to give you a formal definition data refers

to facts and statistics. Collected together

for reference or analysis. All right. This is the definition

of data in terms of statistics and probability. So as we know data

can be collected it can be measured and analyzed it can be visualized by

using statistical models and graphs now data is divided

into two major subcategories. Alright, so first we

have qualitative data and quantitative data. These are the two

different types of data under qualitative data. I'll be have nominal

and ordinal data and under quantitative data. We have discrete

and continuous data. Now, let's focus

on qualitative data. Now this type of data deals with

characteristics and descriptors that can't be easily measured but can be observed subjectively now qualitative data

is further divided into nominal and ordinal data. So nominal data is

any sort of data that doesn't have

any order or ranking? Okay.

An example of nominal

data is gender. Now. There is no ranking in gender. There's only male female

or other right? There is no one two, three four or any sort

of ordering in gender race is another example of nominal data. Now ordinal data is basically an

ordered series of information. Okay, let's say

that you went to a restaurant. Okay. Your information is stored

in the form of customer ID. All right. So basically you are represented

with a customer ID. Now you would have rated

their service as either good or average. All right, that's

how no ordinal data is and similarly they'll have

a record of other customers who visit the restaurant

along with their ratings. All right. So any data which has

some sort of sequence or some sort of order

to it is known as ordinal data. All right, so guys, this is pretty simple

to understand now, let's move on and look

at quantitative data.

So quantitative data

basically these He's with numbers and things. Okay, you can understand that by the word quantitative

itself quantitative is basically quantity. Right Saudis with numbers

a deals with anything that you can measure

objectively, right? So there are two types of quantitative data there is

discrete and continuous data now discrete data is also

known as categorical data and it can hold a finite number

of possible values. Now, the number of students

in a class is a finite Number. All right, you can't

have infinite number of students in a class. Let's say in your fifth grade. There were a hundred students

in your class. All right, there weren't

infinite number but there was a definite finite number

of students in your class. Okay, that's discrete data. Next.

We have continuous data. Now this type of data

can hold infinite number of possible values. Okay. So when you say weight

of a person is an example of continuous data what I mean to see is

my weight can be 50 kgs or it Can be 50.1 kgs or it can be 50.00 one kgs

or 50.000 one or is 50.0 2 3 and so on right? There are infinite number

of possible values, right? So this is what I mean

by continuous data. All right. This is the difference between

discrete and continuous data. And also I would like to mention

a few other things over here. Now, there are a couple

of types of variables as well. We have a discrete variable and we have a continuous

variable discrete variable is also known as

a categorical variable or and it can hold values

of different categories. Let's say that you have

a variable called message and there are two types

of values that this variable can hold let's say that your message

can either be a Spam message or a non spam message. Okay, that's when you call

a variable as discrete or categorical variable.

All right, because it

can hold values that represent different

categories of data now continuous variables

are basically variables that can store in

finite number of values. So the weight of a person

can be denoted as a continuous variable. All right, let's say there is

a variable called weight and it can store infinite number

of possible values. That's why we'll call it

a continuous variable. So guys basically

variable is anything that can store a value right? So if you associate any sort

of data with a A table, then it will become

either discrete variable or continuous variable. There is also dependent and

independent type of variables. Now, we won't discuss all

with that in depth because that's pretty understandable. I'm sure all of you know, what is independent variable

and dependent variable right? Dependent variable is

any variable whose value depends on any other

independent variable? So guys that much

knowledge I expect or if you do have all right.

So now let's move on and look

at our next topic which Which is what is statistics now coming

to the formal definition of statistics statistics is

an area of Applied Mathematics, which is concerned with data collection

analysis interpretation and presentation now usually when I speak about statistics

people think statistics is all about analysis but statistics has other path

toward it has data collection is also part of Statistics data

interpretation presentation. All of this comes into statistics already are

going to use statistical methods to visualize data to collect

data to interpret data. Alright, so the area

of mathematics deals with understanding how data can be used

to solve complex problems.

Okay. Now I'll give you

a couple of examples that can be solved

by using statistics. Okay, let's say that your company

has created a new drug that may cure cancer. How would you conduct

a test to confirm the As Effectiveness now, even though this sounds

like a biology problem. This can be

solved with Statistics. All right, you will have

to create a test which can confirm

the effectiveness of the drum or a this is a common problem that can be solved

using statistics. Let me give you

another example you and a friend are at a baseball

game and out of the blue. He offers you a bet that neither team will hit

a home run in that game. Should you take the BET? All right here you just

discuss the probability of I know you'll win or lose. All right, this

is another problem that comes under statistics. Let's look at another example. The latest sales data

has just come in and your boss wants

you to prepare a report for management on places where the company

could improve its business.

What should you look for? And what should you

not look for now? This problem involves a lot of data analysis will have to

look at the different variables that are causing

your business to go down or the you have to look

at a few variables. That are increasing

the performance of your models and does growing your business. Alright, so this involves

a lot of data analysis and the basic idea behind data analysis is

to use statistical techniques in order to figure

out the relationship between different variables or different components

in your business. Okay. So now let's move on and look at our next topic which is basic

terminologies and statistics. Now before you dive deep

into statistics, it is important that you understand

the basic terminologies used in statistics. The two most important

terminologies in statistics are population and Sample.

So throughout the statistics

course or throughout any problem that you're trying

to stall with Statistics. You will come

across these two words, which is population and Sample

Now population is a collection or a set of individuals

or objects or events. Events whose properties

are to be analyzed. Okay. So basically you can refer

to population as a subject that you're trying to analyze

now a sample is just like the word suggests. It's a subset of the population. So you have to make sure

that you choose the sample in such a way that it represents

the entire population. All right. It shouldn't Focus add one part

of the population instead. It should represent

the entire population. That's how your sample

should be chosen. So Well chosen sample

will contain most of the information about a

particular population parameter. Now, you must be wondering

how can one choose a sample that best represents

the entire population now sampling is a statistical method that deals with the selection

of individual observations within a population.

So sampling is performed in order to infer statistical

knowledge about a population. All right, if you

want to understand the different statistics

of a population like the mean the median Median the mode

or the standard deviation or the variance of a population. Then you're going

to perform sampling. All right, because it's not reasonable for

you to study a large population and find out the mean median

and everything else. So why is sampling

performed you might ask? What is the point of sampling? We can just study

the entire population now guys, think of a scenario where in you're asked

to perform a survey about the eating habits

of teenagers in the US. So at present there are

over 42 million teens in the US and this number is growing as we are speaking

right now, correct. Is it possible to survey each

of these 42 million individuals about their health? Is it possible? Well, it might be possible but this will take

forever to do now.

Obviously, it's not it's

not reasonable to go around knocking each door and asking for what does

your teenage son eat and all of that right? This is not very reasonable. That's Why sampling is used? It's a method wherein a sample

of the population is studied in order to draw inferences

about the entire population. So it's basically

a shortcut to starting the entire population instead

of taking the entire population and finding out

all the solutions. You just going to take

a part of the population that represents the

entire population and you're going to perform

all your statistical analysis your inferential statistics

on that small sample. All right, and that sample basically here

Presents the entire population. All right, so I'm sure

have made this clear to you all what is sample

and what is population now? There are two main types

of sampling techniques that are discussed today.

We have probability sampling

and non-probability sampling now in this video will only be focusing on

probability sampling techniques because non-probability sampling

is not within the scope of this video. All right will only discuss

the probability part because we're focusing on Statistics and

probability correct. Now again under

probability sampling. We have three different types. We have random

sampling systematic and stratified sampling. All right, and just

to mention the different types of non-probability sampling zwi have

no ball Kota judgment and convenience sampling. All right now guys

in this session. I'll only be

focusing on probability. So let's move on and look at the different types

of probability sampling. So what is Probability sampling. It is a sampling technique in which samples

from a large population are chosen by using

the theory of probability. All right, so there

are three types of probability sampling. All right first we have

the random sampling now in this method each member of the population

has an equal chance of being selected in the sample. All right. So each and every individual

or each and every object in the population

has an equal chance of being a A part of the sample.

That's what random

sampling is all about. Okay, you are randomly going

to select any individual or any object. So this Bay each individual has an equal chance

of being selected. Correct? Next. We have systematic sampling now in systematic sampling

every nth record is chosen from the population to be

a part of the sample. All right. Now refer this image that I've shown over here

out of these six groups every Skinned group

is chosen as a sample. Okay. So every second record

is chosen here and this is our systematic sampling works. Okay, you're randomly

selecting the nth record and you're going to add

that to your sample. Next. We have stratified sampling. Now in this type of technique a stratum

is used to form samples from a large population. So what is a stratum

a stratum is basically a subset of the population that shares at

least one comment.

Characteristics so let's say that your population has a mix

of both male and female so you can create to straightens out of this one will have

only the male subset and the other will have

the female subset or a this is what stratum is

it is basically a subset of the population that shares at least

one common characteristics. All right in our example,

it is gender. So after you've created a stratum you're going

to use random sampling on the stratums and you're going

to choose a final Samba. But so random sampling meaning that all of the individuals in each of the stratum

will have an equal chance of being selected

in the sample, correct. So Guys, these were

the three different types of sampling techniques. Now, let's move on and look

at our next topic which is the different

types of Statistics. So after this, we'll be looking at the more

advanced concepts of Statistics, right so far we discuss

the basics of Statistics, which is basically what is statistics

the different sampling. Techniques and the

terminologies and statistics.

All right. Now we look at the different

types of Statistics. So there are two major

types of Statistics descriptive statistics and inferential statistics

in today's session. We will be discussing

both of these types of Statistics in depth. All right, we'll also

be looking at a demo which I'll be running

in the our language in order to make

you understand what exactly descriptive and inferential

statistics is so guys, which is going to look

at the 600 don't worry, if you don't

have much knowledge, I'm explaining everything

from the basic level. All right, so guys descriptive

statistics is a method which is used to describe

and understand the features of specific data set by giving

a short summary of the data.

Okay, so it is mainly focused upon the

characteristics of data. It also provides a graphical

summary of the data now in order to make you understand

what descriptive statistics is, let's suppose. Pose that you want to gift all

your classmates or t-shirt. So to study the average

shirt size of a student in a classroom. So if you were to use

descriptive statistics to study the average shirt size

of students in your classroom, then what you would do is you

would record the shirt size of all students in the class and then you would find out

the maximum minimum and average shirt size of the club. Okay. So coming to inferential

statistics inferential statistics makes Is and predictions about

a population based on the sample of data taken

from the population? Okay. So in simple words, it generalizes a large data set

and it applies probability to draw a conclusion.

Okay. So it allows you

to infer data parameters based on a statistical model

by using sample data. So if we consider

the same example of finding the average shirt size

of students in a class in infinite shal statistics, you will take a sample. All set of the class which is basically a few people

from the entire class. All right, you already

have had grouped the class into large medium and small. All right in this method

you basically build a statistical model and expand it for the entire

population in the class. So guys, there was a brief

understanding of descriptive and inferential statistics.

So that's the difference

between descriptive and inferential now

in the next section, we will go in depth

about descriptive statistics. All right, so, That's a discuss more

about descriptive statistics. So like I mentioned earlier descriptive

statistics is a method that is used to describe

and understand the features of a specific data set by giving

short summaries about the sample and measures of the data.

There are two important measures

in descriptive statistics. We have measure

of central tendency, which is also known as measure of center and we have

measures of variability. This is also known as

measures of spread. Ed so measures of center

include mean median and mode now what is measures of center measures of the center

are statistical measures that represent the summary

of a data set? Okay, the three main measures

of center are mean median and mode coming

to measures of variability or measures of spread. We have range

interquartile range variance and standard deviation. All right. So now let's discuss each of these measures

in a little more. Up starting with

the measures of center. Now. I'm sure all of you know what

the mean is mean is basically the measure of the average

of all the values in a sample. Okay, so it's basically

the average of all the values in a sample. How do you measure the mean I

hope all of you know how the main is measured if there are 10 numbers and you want to find the mean

of these 10 numbers.

All you have to do is you have

to add up all the 10 numbers and you have to divide

it by 10 then here represents the Number

of samples in your data set. All right, since we

have 10 numbers, we're going to

divide this by 10. All right, this will

give us the average or the mean so to better

understand the measures of central tendency. Let's look at an example. Now the data set over here is

basically the cars data set and it contains a few variables. All right, it has

something known as cars. It has mileage per gallon cylinder

type displacement horsepower and roll axle ratio. All right, all of these measures

are related to cars. Okay. So what you're going

to do is you're going to use descriptive analysis and you're going to analyze

each of the variables in the sample data set for the mean standard

deviation median mode and so on.

So let's say that you want

to find out the mean or the average horsepower of the cars among

the population of cards. Like I mentioned earlier what you'll do is you will check

the average of all the values. So in this case, we will take the sum

of the horizontal. Horsepower of each car and we'll divide that

by the total number of cards. Okay, that's exactly what I've done here

in the calculation part. So this hundred

and ten basically represents the horsepower

for the first car. Alright, similarly. I've just added up all

the values of horsepower for each of the cars and I've divided it by 8 now

8 is basically the number of cars in our data set.

All right, so hundred and three

point six two five is what our mean is or Average

of horsepower is all right. Now, let's understand

what median is with an example? Okay. So to Define median median

is basically a measure of the central value of the sample set

is called the median. All right, you can see

that it is a middle value. So if we want to find

out the center value of the mileage per gallon

among the population of cars first, what we'll do is we'll arrange

the MGP values in ascending or descending order and Choose a middle value

right in this case since we have

eight values, right? We have eight values

which is an even entry. So whenever you have even

number of data points or samples in your data set, then you're going

to take the average of the two middle values.

If we had nine values over here. We can easily figure

out the middle value and you know choose

that as a median. But since they're even number

of values we're going to take the average

of the two middle values. All right, so, Eight and twenty three

are my two middle values and I'm taking

the mean of those 2 and hence I get

twenty two point nine, which is my median. All right. Lastly let's look at

how mode is calculated.

So what is mode the value that is most recurrent in the sample set is known as

mode or basically the value that occurs most often. Okay, that is known as mode. So let's say that we want to find out

the most common type of cylinder among the population of cards all we have to Do

is we will check the value which is repeated

the most number of times here. We can see that the cylinders

come in two types. We have cylinder of Type

4 and cylinder of type 6, right? So take a look at the data set. You can see that the most

recurring value is 6 right. We have one two,

three four and five.

We have five six

and we have one two, three. Yeah, we have three four types

of lenders and 5/6. Cylinders. So basically we have three four type cylinders and we

have five six type cylinders. All right. So our mode is going

to be 6 since 6 is more recurrent than 4 so guys those were the measures of the center or the measures

of central tendency. Now, let's move on and look

at the measures of the spread. All right. Now, what is the measure

of spread a measure of spread? Sometimes also called as measure of dispersion is used

to describe the The variability in a sample or population. Okay, you can think

of it as some sort of deviation in the sample. All right. So you measure this with the help of the different

measure of spreads. We have range

interquartile range variance and standard deviation. Now range is pretty

self-explanatory, right? It is the given measure of

how spread apart the values in a data set are

the range can be calculated as shown in this formula.

So you're basically going

to The maximum value in your data set from the minimum value

in your data set. That's how you calculate

the range of the data. Alright, next we

have interquartile range. So before we discuss

interquartile range, let's understand. What a quartile is red. So quartiles basically tell us

about the spread of a data set by breaking the data set

into different quarters. Okay, just like how the median breaks

the data into two parts. The quartile will break it. In two different quarters, so to better understand

how quartile and interquartile are calculated. Let's look at a small example. Now this data set basically

represents the marks of hundred students

ordered from the lowest to the highest scores red.

So the quartiles lie in the following ranges

the first quartile, which is also known as q1 it lies between the 25th

and 26th observation. All right. So if you look at this

I've highlighted the 25th and the Six observation. So how you can calculate

Q 1 or first quartile is by taking the average

of these two values. Alright, since both

the values are 45 when you add them up

and divide them by two you'll still get 45 now

the second quartile or Q 2 is between the 50th

and the fifty first observation. So you're going to take

the average of 58 and 59 and you will get

a value of 58.5 now, this is my second quarter

the third quartile Q3.

Is between the 75th and the 76th observation here

again will take the average of the two values which is the 75th value

and the 76 value right and you'll get a value of 71. All right, so guys

this is exactly how you calculate

the different quarters. Now, let's look at

what is interquartile range. So IQR or the interquartile

range is a measure of variability based on dividing

a data set into quartiles. Now, the interquartile

range is Calculated by subtracting the q1 from Q3. So basically Q3 minus q1 is your IQ are so

your IQR is your Q3 minus q1? All right. Now this is how each of the quartiles are each core

tile represents a quarter, which is 25% All right. So guys, I hope all

of you are clear with the interquartile range

and what our quartiles now, let's look at

variance covariance is basically a measure

that shows how much a I'm variable the first

from its expected value.

Okay. It's basically the variance

in any variable now variance can be calculated by using

this formula right here x basically represents

any data point in your data set n is the total number

of data points in your data set and X bar is basically

the mean of data points. All right. This is how you calculate

variance variance is basically a Computing

the squares of deviations. Okay. That's why it says

s Square there now. Look at what is deviation

deviation is just the difference between each element

from the mean. Okay, so it can be calculated

by using this simple formula where X I basically

represents a data point and mu is the mean

of the population or add this is exactly how you calculate the deviation

Now population variance and Sample variance

are very specific to whether you're calculating

the variance in your population data

set or in your sample data set now the only

difference between Elation and Sample variance. So the formula for population variance

is pretty explanatory.

So X is basically

each data point mu is the mean of the population n is the number of samples

in your data set. All right. Now, let's look at sample. Variance Now sample variance is the average of squared

differences from the mean. All right here x

i is any data point or any sample in your data

set X bar is the mean of your sample. All right. It's not the main

of your population. It's the If your sample and if you notice any here is

a smaller n is the number of data points in your sample.

And this is basically

the difference between sample and population variance. I hope that is clear coming to standard deviation is

the measure of dispersion of a set of data from its mean. All right, so it's basically

the deviation from your mean. That's what standard deviation

is now to better understand how the measures

of spread are calculated. Let's look at a small use case. So let's say the Daenerys

has 20 dragons. They have the numbers

nine to five four and so on as shown on the screen, what you have to do is

you have to work out the standard deviation or at in order to calculate

the standard deviation.

You need to know the mean right? So first you're going to find

out the mean of your sample set. So how do you calculate

the mean you add all the numbers in your data set and divided by the total number

of samples in your data set so you get a value of 7 here then you I'll clear the rhs of

your standard deviation formula. All right, so from

each data point you're going to subtract the mean

and you're going to square that. All right.

So when you do that, you will get

the following result. You'll basically get

this 425 for 925 and so on so finally you

will just find the mean of the squared differences. All right. So your standard deviation will come up to two point

nine eight three once you take the square root. So guys, this is pretty simple. It's a simple

mathematic technique. All you have to do is you have

to substitute the values in the formula. All right. I hope this was clear

to all of you. Now let's move on and discuss the next topic

which is Information Gain and entropy now. This is one of my favorite

topics in statistics. It's very interesting and

this topic is mainly involved in machine learning algorithms, like decision trees

and random forest.

All right, it's very important for you to know how Information Gain and entropy

really work and why they are so essential in building

machine learning models. We focus on the statistic parts

of Information Gain and entropy and after that we'll discuss

As a use case and see how Information Gain and entropy is used

in decision trees. So for those of you who don't know what

a decision tree is it is basically a machine

learning algorithm. You don't have to know

anything about this. I'll explain

everything in depth.

So don't worry. Now. Let's look at what exactly

entropy and Information Gain Is As entropy is

basically the measure of any sort of uncertainty

that is present in the data. All right, so it can be measured

by using this formula. So here s is the set

of all instances in the data set or although data items

in the data set n is the different type of classes in your data set

Pi is the event probability. Now this might seem

a little confusing to you all but when we

go to the use case, you'll understand all

of these terms even better. All right cam.

To Information Gain as the word suggests

Information Gain indicates how much information

a particular feature or a particular variable gives

us about the final outcome. Okay, it can be measured

by using this formula. So again here hedge

of s is the entropy of the whole data set

s SJ is the number of instances with the J value of an attribute a s is

the total number of instances in the data set V is the set of distinct values

of an attribute a hedge of SJ is the entropy

of subsets of instances and hedge of a comma s is

the entropy of an attribute a even though

this seems confusing. I'll clear out the confusion. All right, let's discuss

a small problem statement where we will understand how Information Gain and entropy is used to study

the significance of a model. So like I said Information Gain and entropy are very

important statistical measures that let us understand the significance of

a predictive model. Okay to get a more

clear understanding.

Let's look at a use case. All right now suppose we

are given a problem statement. All right, the statement is

that you have to predict whether a match can be played or Not by studying

the weather conditions. So the predictor variables here

are outlook humidity wind day is also a predictor variable. The target variable

is basically played already. The target variable

is the variable that you're trying to protect. Okay. Now the value of the target

variable will decide whether or not a game

can be played.

All right, so that's

why The play has two values. It has no and yes, no, meaning that the weather

conditions are not good. And therefore you

cannot play the game. Yes, meaning that the weather

conditions are good and suitable for you to play the game. Alright, so that was

a problem statement. I hope the problem statement

is clear to all of you now to solve such a problem. We make use of something

known as decision trees. So guys think

of an inverted tree and each branch of the tree

denotes some decision.

All right, each branch is

Is known as the branch node and at each branch node, you're going to take

a decision in such a manner that you will get an outcome

at the end of the branch. All right. Now this figure

here basically shows that out of 14 observations

9 observations result in a yes, meaning that out of 14 days. The match can be played

only on nine days. Alright, so here if you see on day 1 Day

2 Day 8 day 9 and 11. The Outlook has been Alright, so basically we try

to plaster a data set depending on the Outlook. So when the Outlook is sunny, this is our data set

when the Outlook is overcast. This is what we have and when the Outlook is

the rain this is what we have. All right, so when it is sunny we have

two yeses and three nodes.

Okay, when the

Outlook is overcast. We have all four

as yes has meaning that on the four days

when the Outlook was overcast. We can play the game. All right. Now when it comes to drain, we have three yeses

and two nodes. All right. So if you notice here, the decision is being made by

choosing the Outlook variable as the root node. Okay. So the root node is basically the topmost node

in a decision tree. Now, what we've done here is

we've created a decision tree that starts with

the Outlook node.

All right, then you're splitting

the decision tree further depending on other parameters

like Sunny overcast and rain. All right now like we know

that Outlook has three values. Sunny overcast and brain

so let me explain this in a more in-depth manner. Okay. So what you're doing

here is you're making the decision Tree by choosing

the Outlook variable at the root node. The root note is

basically the topmost node in a decision tree. Now the Outlook node has three

branches coming out from it, which is sunny

overcast and rain. So basically Outlook can have three values

either it can be sunny. It can be overcast

or it can be rainy. Okay now these three values

Use are assigned to the immediate Branch

nodes and for each of these values the possibility of play is equal

to yes is calculated. So the sunny and the rain branches

will give you an impure output.

Meaning that there is a mix

of yes and no right. There are two yeses

here three nodes here. There are three yeses here

and two nodes over here, but when it comes

to the overcast variable, it results in a hundred

percent pure subset. All right, this shows that

the overcast baby. Will result in a definite

and certain output. This is exactly what entropy

is used to measure. All right, it calculates

the impurity or the uncertainty. Alright, so the lesser

the uncertainty or the entropy of a variable more

significant is that variable? So when it comes to overcast

there's literally no impurity in the data set. It is a hundred percent

pure subset, right? So be want variables like these

in order to build a model. All right now, we don't always Ways get lucky

and we don't always find variables that will result

in pure subsets.

That's why we have

the measure entropy. So the lesser the entropy of

a particular variable the most significant that variable

will be so in a decision tree. The root node is assigned

the best attribute so that the decision tree

can predict the most precise outcome meaning

that on the root note. You should have the most

significant variable. All right, that's why

we've chosen Outlook or and now some of you might ask

me why haven't you chosen overcast Okay is overcast

is not a variable. It is a value

of the Outlook variable. All right. That's why we've chosen

outlook here because it has a hundred percent pure subset

which is overcast. All right. Now the question in your head is

how do I decide which variable or attribute best Blitz

the data now right now, I know I looked at the data and I told you that, you know here we have

a hundred percent pure subset, but what if it's

a more complex problem and you're not able

to understand which variable will best split the data, so guys when it comes to decision tree

Information and gain and entropy will help you understand which variable

will best split the data set.

All right, or which variable you

have to assign to the root node because whichever variable

is assigned to the dude node. It will best let the data set and it has to be the most

significant variable. All right. So how we can do this

is we need to use Information Gain and entropy. So from the total

of the 14 instances that we saw nine

of them said yes and 5 of the instances said know that you cannot play

on that particular day. All right. So how do you

calculate the entropy? So this is the formula

you just substitute the values in the formula. So when you substitute

the values in the formula, you will get a value of 0.9940.

All right. This is the entropy or this is the uncertainty

of the data present in a sample. Now in order to ensure that we choose the best variable

for the root node. Let us look at all

the possible combinations that you can use

on the root node. Okay, so these are All

the possible combinations you can either have

Outlook you can have windy humidity or temperature. Okay, these are four variables

and you can have any one of these variables

as your root node. But how do you select which variable best

fits the root node? That's what we are going

to see by using Information Gain and entropy.

So guys now the task at hand

is to find the information gain for each of these attributes. All right. So for Outlook for windy for

humidity and for temperature, we're going to find

out the information. Nation gained right now

a point to remember is that the variable that results in the highest

Information Gain must be chosen because it will give us the most

precise and output information. All right. So the information gain for

attribute windy will calculate that first here. We have six instances of true

and eight instances of false.

Okay. So when you substitute all

the values in the formula, you will get a value

of zero point zero four eight. So we get a value

of You 2.0 for it. Now. This is a very low value

for Information Gain. All right, so the information that you're going to get from

Windy attribute is pretty low. So let's calculate

the information gain of attribute Outlook.

All right, so from the total

of 14 instances, we have five instances

with say Sunny for instances, which are overcast

and five instances, which are rainy. All right for Sonny. We have three yeses and to nose

for overcast we have All the for as yes for any we have

three years and two nodes. Okay. So when you calculate

the information gain of the Outlook variable

will get a value of zero point 2 4 7 now compare

this to the information gain of the windy attribute. This value is

actually pretty good. Right we have zero point 2 4 7

which is a pretty good value for Information Gain. Now, let's look

at the information gain of attribute humidity

now over here. We have seven instances

with say hi and seven instances with say normal. Right and under

the high Branch node. We have three instances

with say yes, and the rest for instances

would say no similarly under the normal Branch. We have one two, three, four, five six seven

instances would say yes and one instance with says no. All right. So when you calculate

the information gain for the humidity variable, you're going to get

a value of 0.15 one.

Now. This is also

a pretty decent value, but when you compare it

to the Information Gain, Of the attribute Outlook it

is less right now. Let's look at the information

gain of attribute temperature. All right, so the temperature

can hold repeat. So basically the temperature

attribute can hold hot mild and cool. Okay under hot. We have two instances

with says yes and two instances for no under mild. We have four instances of yes

and two instances of no and under col we have

three instances of yes and one instance of no. All right. When you calculate the information gain

for this attribute, you will get a value

of zero point zero to nine, which is again very less.

So what you can summarize

from here is if we look at the information gain for each

of these variable will see that for Outlook. We have the maximum gain. All right, we have

zero point two four seven, which is the highest

Information Gain value and you must always choose

a variable with the highest Information Gain to split

the data at the root node. So that's why we assign

The Outlook variable at the root node. All right, so guys. I hope this use case with clear

if any of you have doubts. Please keep commenting

those doubts now, let's move on and look at what

exactly a confusion Matrix is the confusion Matrix

is the last topic for descriptive statistics

read after this. I'll be running a short demo

where I'll be showing you how you can calculate

mean median mode and standard deviation variance

and all of those values by using our okay.

So let's talk about

confusion Matrix now guys. What is the confusion Matrix

now don't get confused. This is not any complex

topic now confusion. Matrix is a matrix that is often used to describe

the performance of a model. All right, and this

is specifically used for classification models or a classifier and what it does is it

will calculate the accuracy or it will calculate the

performance of your classifier by comparing your actual results

and Your predicted results. All right. So this is what it looks like to positive to

negative and all of that. Now this is a little confusing. I'll get back to what

exactly true positive to negative and all

of this stands for for now. Let's look at an example and

let's try and understand what exactly confusion Matrix is. So guys have made sure that I put examples

after each and every topic because it's important you understand the Practical

part of Statistics. All right statistics has

literally nothing to do with Theory you need

to understand how Calculations are done in statistics. Okay. So here what I've done is now

let's look at a small use case.

Okay, let's consider that your given data

about a hundred and sixty five patients out of which hundred

and five patients have a disease and the remaining 50 patients

don't have a disease. Okay. So what you're going to do is

you will build a classifier that predicts by using these hundred and

sixty five observations. You'll feed all of these 165 observations

to your classifier and it will predict

the output every time a new patients detail is fed

to the classifier right now out of these 165 cases.

Let's say that

the classifier predicted. Yes hundred and ten times

and no 55 times. Alright, so yes

basically stands for yes. The person has a disease

and no stands for know. The person does

not have a disease. All right, that's

pretty self-explanatory. But yeah, so it predicted

that a hundred and ten times. Patient has a disease

and 55 times that know the patient

doesn't have a disease. However in reality only

hundred and five patients in the sample have

the disease and 60 patients who do not have

the disease, right? So how do you calculate

the accuracy of your model? You basically build

the confusion Matrix? All right. This is how the Matrix looks like and basically denotes

the total number of observations that you have which is 165 in our case

actual denotes the actual use in the data set and predicted denotes

the predicted values by the classifier. So the actual value is no here and the predicted

value is no here. So your classifier

was correctly able to classify 50 cases as no. All right, since both

of these are no so 50 it was correctly able

to classify but 10 of these cases it

incorrectly classified meaning that your actual value here

is no but you classifier predicted it as yes or I that's why this

And over here similarly it wrongly predicted that five patients

do not have diseases whereas they actually

did have diseases and it correctly

predicted hundred patients, which have the disease.

All right. I know this is

a little bit confusing. But if you look

at these values no, no 50 meaning that it correctly

predicted 50 values No Yes means that it

wrongly predicted. Yes for the values are it

was supposed to predict. No. All right. Now what exactly is? Is this true positive

to negative and all of that? I'll tell you what

exactly it is. So true positive are the cases

in which we predicted a yes and they do not actually

have the disease. All right, so it is

basically this value already predicted a yes here, even though they

did not have the disease. So we have 10 true positives

right similarly true- is we predicted know and they don't have

the disease meaning that this is correct. False positive is be predicted.

Yes, but they do not

actually have the disease or at this is also known as type

1 error falls- is we predicted. No, but they actually

do not have the disease. So guys basically falls- and true negatives are basically

correct classifications. All right. So this was confusion Matrix and I hope this concept

is clear again guys. If you have doubts, please comment your doubt

in the comment section. So guys, that was

the entire descriptive. X module and now we

will discuss about probability. Okay. So before we understand

what exactly probability is, let me clear out a very

common misconception people often tend to ask

me this question. What is the relationship between

statistics and probability? So probability and statistics

are related fields.

All right. So probability is

a mathematical method used for statistical analysis. Therefore we can say that a probability and

statistics are interconnected. Launches of mathematics that deal with analyzing the

relative frequency of events. So they're very

interconnected feels and probability makes

use of statistics and statistics makes use of probability or a they're

very interconnected Fields. So that is the relationship

between statistics and probability. Now, let's understand

what exactly is probability. So probability is the measure of How likely an event

will occur to be more precise. It is the ratio. Of desired outcome

to the total outcomes. Now, the probability of all outcomes always sum up

to 1 the probability will always sum up to 1 probability

cannot go beyond one. Okay. So either your probability

can be 0 or it can be 1 or it can be in the form

of decimals like 0.5 to or 0.55 or it can be

in the form of 0.5 0.7 0.9. But it's valuable always stay

between the range 0 and 1.

Okay at the famous example of probability is rolling

a dice example. So when you roll a dice you get

six possible outcomes, right? You get one two, three four and five six

phases of a dice now each possibility only

has one outcome. So what is the probability

that on rolling a dice? You will get 3 the probability

is 1 by 6, right because there's only one phase which has the number 3 on it

out of six phases. There's only one phase

which has the number three. So the probability of getting 3 when you roll a dice

is 1 by 6 similarly, if you want to find

the probability of getting a number 5 again, the probability is

going to be 1 by 6. All right, so all

of this will sum up to 1. All right, so guys this is

exactly what probability is. It's a very simple concept

we all learnt it in 8 standard onwards right now. Let's understand the

different terminologies that are related to probability. Now the three terminologies

that you often come across when We talk about probability. We have something known

as the random experiment.

Okay, it's basically

an experiment or a process for which the outcomes cannot be

predicted with certainty. All right. That's why you use probability. You're going to use probability

in order to predict the outcome with some sort of certainty sample space

is the entire possible set of outcomes of a random

experiment an event is one or more outcomes

of an experiment. So if you consider the example

Love rolling a dice. Now. Let's say that you want

to find out the probability of getting a to

when you roll the dice. Okay. So finding this probability

is the random experiment the sample space is basically

your entire possibility. Okay. So one two, three, four, five six phases are there

and out of that you need to find the probability

of getting a 2, right. So all the possible outcomes will basically represent

your sample space.

Okay. So 1 to 6 are all your possible

outcomes this represents. Sample space event is

one or more outcome of an experiment. So in this case

my event is to get a to when I roll a dice, right? So my event is the probability

of getting a to when I roll a dice. So guys, this is basically what

random experiment sample space and event really means alright. Now, let's discuss

the different types of events. There are two types of events

that you should know about there is disjoint and non disjoint

events disjoint events. These are events that do not have

any common outcome. For example, if you draw a single card

from a deck of cards, it cannot be a king

and a queen correct. It can either be king

or it can be Queen. Now a non disjoint

events are events that have common outcomes. For example, a student

can get hundred marks in statistics and hundred

marks in probability.

All right, and also the outcome of a ball delibird

can be a no ball and it can be a 6 right. So this is Non

disjoint events are or n. These are very simple

to understand right now. Let's move on and look

at the different types of probability distribution. All right, I'll be discussing the three main probability

distribution functions.

I'll be talking

about probability density function normal distribution

and Central limit theorem. Okay probability density

function also known as PDF is concerned with the relative likelihood for

a continuous random variable. To take on a given value. All right. So the PDF gives the probability of a variable that lies

between the range A and B. So basically what you're trying

to do is you're going to try and find the probability

of a continuous random variable over a specified range. Okay. Now this graph denotes the PDF

of a continuous variable. Now, this graph is also known

as the bell curve right? It's famously called

the bell curve because of its shape and there are

three important properties that you To know about

a probability density function.

Now the graph of a PDF

will be continuous over a range. This is because you're

finding the probability that a continuous variable lies

between the ranges A and B, right the second property is that the area bounded by

the curve of a density function and the x-axis is equal

to 1 basically the area below the curve is equal

to 1 all right, because it denotes

probability again the probability cannot arrange. More than one it has to be between 0 and 1 property number

three is that the probability that our random variable

assumes a value between A and B is equal to the area under the PDF bounded

by A and B.

Okay. Now what this means is that the probability value

is denoted by the area of the graph. All right, so whatever value

that you get here, which basically one

is the probability that a random variable will lie

between the range A and B. All right, so I hope If you have understood the

probability density function, it's basically the probability

of finding the value of a continuous random variable

between the range A and B. All right. Now, let's look

at our next distribution, which is normal distribution

now normal distribution, which is also known as the gaussian distribution is

a probability distribution that denotes the

symmetric property of the mean right meaning that the idea

behind this function is that The data near the mean occurs more frequently than

the data away from the mean. So what it means to say is that the data around the mean

represents the entire data set.

Okay. So if you just take

a sample of data around the mean it can represent

the entire data set now similar to the probability density

function the normal distribution appears as a bell curve. All right. Now when it comes

to normal distribution, there are two important factors. All right, we have the mean

of the population. And the standard deviation.

Okay, so the mean and the graph

determines the location of the center of the graph, right and the standard deviation

determines the height of the graph. Okay. So if the standard deviation

is large the curve is going to look something like this. All right, it'll be

short and wide and if the standard deviation

is small the curve is tall and narrow. All right. So this was it

about normal distribution.

Now, let's look

at the central limit theorem. Now the central

limit theorem states that the sampling distribution of the mean of any independent

random variable will be normal or nearly normal if the sample size

is large enough now, that's a little confusing. Okay. Let me break it down for

you now in simple terms if we had a large population and we divided it

into many samples. Then the mean of all the samples from the population

will be almost equal to the mean of the entire

population right meaning that each of the sample

is normally distributed.

Right. So if you compare the mean

of each of the sample, it will almost be equal

to the mean of the population. Right? So this graph basically shows

a more clear understanding of the central limit theorem red

you can see each sample here and the mean of each sample is almost

along the same line, right? Okay. So this is exactly what the central limit theorem

States now the accuracy or the resemblance to the normal distribution

depends on two main factors. Right. So the first is the number

of sample points that you consider. All right, and the second is a shape

of the underlying population. Now the shape obviously depends

on the standard deviation and the mean

of a sample, correct.

So guys the central

limit theorem basically states that each sample

will be normally distributed in such a way that the mean of each sample

will coincide with the mean of the actual population. All right in short terms. That's what central

limit theorem States. Alright, and this

holds true only for a large. Is it mostly

for a small data set and there are more deviations when compared to a large

data set is because of the scaling Factor, right? The small is deviation in a small data set will change

the value very drastically, but in a large data

set a small deviation will not matter at all. Now, let's move on and look

at our next topic which is the different

types of probability. Now, this is a important topic because most of your problems

can be solved by understanding which type of probability

should I use to solve? This problem right? So we have three important

types of probability.

We have marginal joint

and conditional probability. So let's discuss each of these now the probability of

an event occurring unconditioned on any other event is known

as marginal probability or unconditional probability. So let's say that you want

to find the probability that a card drawn is a heart. All right. So if you want to

find the probability that a card drawn is

a heart the prophet. B13 by 52 since there

are 52 cards in a deck and there are 13 hearts

in a deck of cards. Right and there are

52 cards in a turtleneck. So your marginal probability

will be 13 by 52. That's about

marginal probability. Now, let's understand. What is joint probability. Now joint probability is a measure of two events

happening at the same time.

Okay. Let's say that the two

events are A and B. So the probability of event A and B occurring is

the dissection of A and B. So for example, if you want to

find the probability that a card is a four and a red

that would be joint probability. All right, because

you're finding a card that is 4 and the card

has to be red in color. So for the answer, this will be 2 by 52

because we have 1/2 in heart and we have 1/2

and diamonds correct. So both of these are red

and color therefore. Our probability is to by 52 and if you further down

it Is 1 by 26, right? So this is what

joint probability is all about moving on. Let's look at what exactly

conditional probability is. So if the probability of an event or an outcome

is based on the occurrence of a previous event

or an outcome, then you call it as

a conditional probability. Okay. So the conditional probability

of an event B is the probability that the event will occur given that an event a has

already occurred, right? So if a and b are

dependent events, then the expression for conditional probability

is given by this.

Now this first term

on the left hand side, which is p b of a is

basically the probability of event B occurring given that event a

has already occurred. All right. So like I said, if a and b are dependent events, then this is

the expression but if a and b are independent events, and the expression

for conditional probability is like this, right? So guys P of A and B of B is

obviously the probability of A and probability of B right now.

Let's move on now in order to understand conditional

probability joint probability and marginal probability. Let's look at a small use case. Okay now basically

we're going to take a data set which examines the salary

package and training undergone my candidates. Okay. Now in this there are

60 candidates without training and forty five candidates, which have enrolled for

Adder a curse training. Right. Now the task here is you have

to assess the training with a salary package. Okay, let's look at this

in a little more depth. So in total, we have hundred and five

candidates out of which 60 of them have not enrolled

Frederick has training and 45 of them have enrolled

for a deer Acres training or this is a small survey that was conducted and this is the rating

of the package or the salary that they got right? So if you read through the data, you can understand

there were five candidates.

It's without education or training who got a very

poor salary package. Okay. Similarly, there are 30 candidates with

Ed Eureka training who got a good package, right? So guys basically you're

comparing the salary package of a person depending on whether or not they've enrolled

for a director training, right? This is our data set. Now, let's look at our problem

statement find the probability that a candidate

has undergone a Drake has training quite simple, which type of probability

is this Is this is marginal probability? Right? So the probability that a candidate has undergone

edger Acres training is obviously 45 divided

by a hundred and five since 45 is the number of candidates with

Eddie record raining and hundred and five is

the total number of candidates.

So you get a value

of approximately 0.4 to all right, that's the probability

of a candidate that has undergone educate

a girl straining next question find the probability that a candidate has attended

edger Acres training. Also has good package. Now. This is obviously a joint

probability problem, right? So how do you

calculate this now? Since our table is quite

formatted we can directly find that people who have

gotten a good package along with Eddie record

raining or 30, right? So out of hundred and

five people 30 people have education training

and a good package, right? They specifically

asking for people with Eddie record raining. Remember that night. The question is find the

probability that a gang Today, it has attended

editor Acres training and also has a good package. All right, so we need

to consider two factors that is a candidate who's addenda deaderick

has training and who has a good package.

So clearly that number is 30 30 divided by

total number of candidates, which is 1:05, right? So here you get

the answer clearly next. We have find the probability that a candidate has

a good package given that he has not

undergone training. Okay. Now this is Early

conditional probability because here you're defining

a condition you're saying that you want to find

the probability of a candidate who has a good package given

that he's not undergone. Any training, right? The condition is that he's

not undergone any training. All right. So the number of people who have not undergone

training are 60 and out of that five of them

have got a good package that so that's why this is Phi

by 60 and not five by a hundred and five because here they have clearly

mentioned has a good pack. Given that he has

not undergone training. So you have to only consider people who have

not undergone training, right? So any five people who have not undergone

training have gotten a good package, right? So 5 divided by 60 you get

a probability of around 0.08 which is pretty low, right? Okay.

So this was all about the different types

of probability now, let's move on and look at

our last Topic in probability, which is base theorem. Now guys base. Your room is a very

important concept when it comes to statistics and probability. It is majorly used

in knife bias algorithm. Those of you who aren't aware. Now I've bias is a supervised

learning classification algorithm and it is mainly used

in Gmail spam filtering right? A lot of you might have noticed

that if you open up Gmail, you'll see that you have

a folder called spam right or that is carried out

through machine learning and And the algorithm use

there is knife bias, right? So now let's discuss what

exactly the Bayes theorem is and what it denotes

the bias theorem is used to show the relation between

one conditional probability and it's inverse.

All right. Basically it's nothing

but the probability of an event occurring based

on prior knowledge of conditions that might be related

to the same event. Okay. So mathematically the bell's

theorem is represented like this right now. Shown in this equation. The left-hand term is referred

to as the likelihood ratio which measures the probability of occurrence of event

be given an event a okay on the left hand side is what is known as

the posterior right is referred to as posterior, which means that the probability of occurrence of a given

an event be right.

The second term is referred to as the likelihood Ratio or at

this measures the probability of occurrence of B

given an event. A now P of a is also

known as the prior which refers to the actual

probability distribution of A and P of B is again, the probability of B, right. This is the bias theorem and in order to better

understand the base theorem. Let's look at a small example. Let's say that we have

three bowels we have bow is a bow will be and bouncy. Okay barley contains

two blue balls and for red balls bowel B contains eight blue

balls and for red balls.

Wow Zeke. Games one blue ball

and three red balls now if we draw one ball

from each Bowl, what is the probability

to draw a blue ball from a bowel a if we know that we drew exactly a total

of two blue balls, right? If you didn't

understand the question, please read it I shall pause

for a second or two. Right. So I hope all of you

have understood the question. Okay. Now what I'm going to do

is I'm going to draw a blueprint for you and tell you how exactly

to solve the problem. But I want you all to give

me the solution to this problem, right? I'll draw a blueprint.

I'll tell you

what exactly the steps are but I want you to come

up with a solution on your own right the formula

is also given to you. Everything is given to you. All you have to do is come up

with the final answer. Right? Let's look at how you

can solve this problem. So first of all, what we will do is

Let's consider a all right, let a be the event of picking a blue ball

from bag in and let X be the event of picking

exactly two blue balls, right because these

are the two events that we need to calculate

the probability of now there are two probabilities

that you need to consider here.

One is the event of picking

a blue ball from bag a and the other is the event of

picking exactly two blue balls. Okay. So these two are represented

by a and X respectively and so what we want is

the probability of occurrence of event a given X, which means that given that we're picking

exactly two blue balls. What is the probability that we are picking

a blue ball from bag? So by the definition

of conditional probability, this is exactly what

our equation will look like. Correct. This is basically a occurrence

of event a given element X and this is

the probability of a and x and this is the probability

of X alone, correct.

What we need to do is we need

to find these two probabilities which is probability of a

and X occurring together and probability of X. Okay. This is the entire solution. So how do you find P probability of X this you can do

in three ways. So first is white ball

from a either white from be or read from see now first is

to find the probability of x x basically represents the event of picking exactly

two blue balls. Right. So these are the three ways

in which it is possible. So you'll pick one blue ball

from bowel a and one from bowel be in the second case. You can pick one from a and another blue ball

from see in the third case. You can pick a blue

ball from Bagby and a blue ball from bagsy. Right? These are the three ways

in which it is possible. So you need to find

the probability of each of this step do is that you need to find

the probability of a and X occurring together.

This is the sum

of terms one and two. Okay, this is because in both

of these events, you're picking a ball

from bag, correct? So there is find out

this probability and let me know your answer

in the comment section. All right. We'll see if you get

the answer right? I gave you the entire

solution to this. All you have to do is

substitute the value right? If you want a second or two, I'm going to pause on the screen

so that you can go through this in a more clearer way right? Remember that you need

to calculate two. He's the first probability that you need to calculate is

the event of picking a blue ball from bag a given that you're picking

exactly two blue balls.

Okay, II probability you need

to calculate is the event of picking exactly to bluebirds. All right. These are the two probabilities. You need to calculate so

remember that and this is the solution. All right, so guys, make sure you

mention your answers in the comment section for now. Let's move on and Get

our next topic, which is the

inferential statistics. So guys, we just completed the

probability module right now. We will discuss

inferential statistics, which is the second

type of Statistics. We discussed descriptive

statistics earlier. All right. So like I mentioned earlier

inferential statistics also known as statistical inference

is a branch of Statistics that deals with forming

inferences and predictions about a population based

on a sample of data. Taken from the population. All right, and the question

you should ask is how does one form inferences

or predictions on a sample? The answer is you

use Point estimation? Okay.

Now you must be wondering what is point estimation

one estimation is concerned with the use of the sample data

to measure a single value which serves as

an approximate value or the best estimate of

an unknown population parameter. That's a little confusing. Let me break it down

to you for Camping in order to calculate the mean

of a huge population. What we do is we first draw out

the sample of the population and then we find the sample mean right the sample mean

is then used to estimate the population mean this is

basically Point estimate, you're estimating the value

of one of the parameters of the population, right? Basically the main you're trying to estimate

the value of the mean.

This is what point estimation is

the two main terms in point estimation. There's something known as as the estimator

and the something known as the estimate estimator

is a function of the sample that is used to find

out the estimate. Alright in this example. It's basically the sample

mean right so a function that calculates the sample

mean is known as the estimator and the realized value of the estimator is

the estimate right? So I hope Point

estimation is clear. Now, how do you

find the estimates? There are four common ways

in which you can do this.

The first one is

method of Moment yo, what you do is

you form an equation in the sample data set and then you analyze

the similar equation in the population data set as well like the population mean

population variance and so on. So in simple terms, what you're doing is you're

taking down some known facts about the population and you're extending

those ideas to the sample. Alright, once you do that, you can analyze the sample

and estimate more essential or more complex

values right next.

We have maximum likelihood. This method basically uses

a model to estimate a value. All right. Now a maximum likelihood

is majorly based on probability. So there's a lot of probability

involved in this method next. We have the base estimator

this works by minimizing the errors or the average risk. Okay, the base estimator

has a lot to do with the Bayes theorem. All right, let's

not get into the depth of these estimation methods. Finally. We have the best unbiased

estimators in this method. There are seven unbiased

estimators that can be used to approximate a parameter. Okay. So Guys these were

a couple of methods that are used

to find the estimate but the most well-known method

to find the estimate is known as the interval estimation.

Okay. This is one of the most important

estimation methods right? This is where confidence

interval also comes into the picture right apart

from interval estimation. We also have something

known as margin of error. So I'll be discussing

all of this. In the upcoming slides. So first let's understand. What is interval estimate? Okay, an interval

or range of values, which are used to estimate a

population parameter is known as an interval estimation, right? That's very understandable. Basically what they're trying to

see is you're going to estimate the value of a parameter. Let's say you're trying to find

the mean of a population. What you're going to do is

you're going to build a range and your value will lie in

that range or in that interval. Alright, so this way your output

is going to be more accurate because you've not predicted

a point estimation instead. You have estimated an interval within which your value

might occur, right? Okay.

Now this image clearly shows how Point estimate and interval

estimate or different so guys interval estimate

is obviously more accurate because you are not just

focusing on a particular value or a particular point in order to predict

the probability instead. You're saying that

the value might be within this range between

the lower confidence limit and the upper confidence limit. All right, this is denotes

the range or the interval. Okay, if you're still confused

about interval estimation, let me give you a small example if I stated that I will take

30 minutes to reach the theater.

This is known

as Point estimation. Okay, but if I stated that I will take

between 45 minutes to an hour to reach the theater. This is an example

of into Estimation. All right. I hope it's clear. Now now interval estimation

gives rise to two important statistical terminologies one

is known as confidence interval and the other is known

as margin of error. All right. So there's it's important that you pay attention to both of these terminologies

confidence interval is one of the most significant measures that are used to check how essential machine

learning model is.

All right. So what is confidence interval

confidence interval is the measure of your confidence that the interval

estimated contains the population parameter

or the population mean or any of those parameters

right now statisticians use confidence interval

to describe the amount of uncertainty associated with the sample estimate of

a population parameter now guys, this is a lot of definition. Let me just make you

understand confidence interval with a small example. Okay. Let's say that you perform a survey and you survey

a group of cat owners. The see how many cans

of cat food they purchase in one year. Okay, you test your statistics at the 99

percent confidence level and you get

a confidence interval of hundred comma 200 this means that you think that the cat owners by between hundred to two

hundred cans in a year and also since the confidence

level is 99% shows that you're very confident

that the results are, correct.

Okay. I hope all of you

are clear with that. Alright, so your confidence

interval here will be a hundred and two hundred and your confidence level

will be 99% Right? That's the difference

between confidence interval and confidence level So

within your confidence interval your value is going to lie and

your confidence level will show how confident you are

about your estimation, right? I hope that was clear. Let's look at margin of error. No margin of error for a given level of confidence

is a greatest possible distance between the Point estimate and the value of the parameter that it is estimating

you can say that it is a deviation from

the actual point estimate right.

Now. The margin of error

can be calculated using this formula now zc

her denotes the critical value or the confidence interval and this is X standard

deviation divided by root of the sample size. All right, n is basically

the sample size now, let's understand how

you can estimate the confidence intervals. So guys the level of confidence which is denoted by

C is the probability that the interval estimate

contains a population parameter. Let's say that you're trying

to estimate the mean. All right. So the level of confidence

is the probability that the interval

estimate contains the population parameter. So this interval

between minus Z and z or the area beneath this curve

is nothing but the probability that the interval estimate

contains a population parameter. You don't all right. It should basically

contain the value that you are predicting right. Now. These are known

as critical values. This is basically

your lower limit and your higher

limit confidence level. Also, there's something

known as the Z score now. This court can be calculated by

using the standard normal table. All right, if you look

it up anywhere on Google you'll find the z-score table or the standard normal

table to understand how this is done.

Let's look at a small example. Okay, let's say

that the level of confidence. Vince is 90% This means

that you are 90% confident that the interval contains

the population mean. Okay, so the remaining 10%

which is out of hundred percent. The remaining 10%

is equally distributed on these tail regions. Okay, so you have 0.05 here

and 0.05 over here, right? So on either side of see you will distribute

the other leftover percentage now these Z scores

are calculated from the table as I mentioned before. All right one. I'm 6 4 5 is get collated

from the standard normal table. Okay, so guys how you estimate

the level of confidence? So to sum it up. Let me tell you the steps that

are involved in constructing a confidence interval first. You would start by identifying

a sample statistic. Okay. This is the statistic that you will use to estimate

a population parameter. This can be anything

like the mean of the sample next you

will select a confidence level now the confidence level

describes the uncertainty of a Sampling method right after that you'll find

something known as the margin of error right? We discussed margin

of error earlier. So you find this based

on the equation that I explained

in the previous slide, then you'll finally specify

the confidence interval.

All right. Now, let's look

at a problem statement to better understand

this concept a random sample of 32 textbook prices is taken

from a local College Bookstore. The mean of the sample is so so and so and the sample

standard deviation is This use a 95% confident level and find the margin

of error for the mean price of all text books

in the bookstore. Okay. Now, this is a very

straightforward question. If you want you can read

the question again. All you have to do is you have

to just substitute the values into the equation.

All right, so guys, we know the formula for margin

of error you take the Z score from the table. After that we have deviation

Madrid's 23.4 for right and that's standard deviation

and n stands for the number of samples here. The number of samples is

32 basically 32 textbooks. So approximately your margin

of error is going to be around 8.1 to this is

a pretty simple question.

All right. I hope all of you

understood this now that you know, the idea behind

confidence interval. Let's move ahead to one of the most important topics

in statistical inference, which is hypothesis

testing, right? So Ugly statisticians

use hypothesis testing to formally check whether the hypothesis

is accepted or rejected. Okay, hypothesis. Testing is an inferential

statistical technique used to determine whether there is enough evidence

in a data sample to infer that a certain condition holds

true for an entire population. So to understand the characteristics

of a general population, we take a random sample, and we analyze the properties

of the sample right we test. Whether or not the identified

conclusion represents the population accurately and finally we interpret

the results now whether or not to accept

the hypothesis depends upon the percentage value

that we get from the hypothesis. Okay, so to

better understand this, let's look at a small

example before that.

There are a few steps

that are followed in hypothesis testing you begin

by stating the null and the alternative hypothesis. All right. I'll tell you what

exactly these terms are and then you formulate. Analysis plan right after that

you analyze the sample data and finally you can

interpret the results right now to understand

the entire hypothesis testing. We look at a good example. Okay now consider

for boys Nick jean-bob and Harry these boys

were caught bunking a class and they were asked

to stay back at school and clean the classroom

as a punishment, right? So what John did is he decided that four of them would take

turns to clean their classrooms.

He came up with a plan

of writing each of their names on chits and putting them in a bowl now every day they had

to pick up a name from the bowel and that person had to play

in the clock, right? That sounds pretty fair

enough now it is been three days and everybody's name has come up

except John's assuming that this event

is completely random and free of bias. What is a probability of John not cheating

right or is the probability that he's not actually

cheating this can Solved by using hypothesis testing. Okay. So we'll Begin by calculating

the probability of John not being picked for a day. Alright, so we're

going to assume that the event is free of bias. So we need to find

out the probability of John not cheating right first

we will find the probability that John is not picked

for a day, right? We get 3 out of 4, which is basically 75%

75% is fairly high.

So if John is not picked

for three days in a row the Probability will drop down

to approximately 42% Okay. So three days in a row meaning that is the probability

drops down to 42 percent. Now, let's consider a situation where John is not picked

for 12 days in a row the probability drops down

to three point two percent. Okay. That's the probability

of John cheating becomes fairly high, right? So in order for statisticians to come

to a conclusion, they Define what is known

as a threshold value.

Right considering

the above situation if the threshold value

is set to 5 percent. It would indicate that if the probability lies

below 5% then John is cheating his way out of detention. But if the probability is

about threshold value then John it just lucky and his name

isn't getting picked. So the probability and hypothesis testing give rise

to two important components of hypothesis testing, which is null hypothesis

and alternative hypothesis. Null. Hypothesis is based. Basically approving the Assumption alternate

hypothesis is when your result disapproves

the Assumption right therefore in our example, if the probability

of an event occurring is less than 5% which it is

then the event is biased hence. It proves the

alternate hypothesis. So guys with this we come

to the end of this session. Let's go ahead

and understand what exactly is.

Was learning so

supervised learning is where you have

the input variable X and the output variable Y

and use an algorithm to learn the map Egg function

from the input to the output as I mentioned earlier with the example

of face detection. So it is called

supervised learning because the process

of an algorithm learning from the training data

set can be thought of as a teacher supervising

the learning process. So if we have a look

at the supervised learning steps or What would rather

say the workflow? So the model is used

as you can see here. We have the historic data. Then we again we have

the random sampling. We split the data

into train your asset and the testing data set using

the training data set. We with the help

of machine learning which is supervised

machine learning.

We create statistical model then after we have a mod

which is being generated with the help

of the training data set. What we do is use

the testing data set for production and testing. What we do is get the output and finally we have

the model validation outcome. That was the

training and testing. So if we have a look

at the prediction part of any particular supervised

learning algorithm, so the model is used

for operating outcome of a new data set.

So whenever performance of the model degraded

the model is retrained or if there are

any performance issues, the model is retained with

the help of the new data now when we talk about supervisor in there not just one

but quite a few algorithms here. So we have linear

regression logistic regression. This is entry. We have random Forest. We have made by classifiers. So linear regression is used

to estimate real values. For example, the cost of houses. The number of calls

the total sales based on the continuous variables. So that is what

reading regression is. Now when we talk

about logistic regression, which is used to estimate

discrete values, for example, which are binary values

like 0 and 1 yes, or no true.

False based on the given set

of independent variables. So for example, when you are talking

about something like the chances of winning or if you talk about winning which can be

either true or false if will it rain today

with it can be the yes or no, so it cannot be like when the output

of a particular algorithm or the particular

question is either. Yes.

No or Banner e then only we use a large

stick regression the next we have decision trees. So now these are used for

classification problems it work. X for both

categorical and continuous dependent variables and if we talk about random Forest

So Random Forest is an M symbol of a decision tree, it gives better prediction

accuracy than decision tree. So that is another type of

supervised learning algorithm.

And finally we have

the need based classifier. So it was a classification technique based

on the Bayes theorem with an assumption of

Independence between predictors. A linear regression is one

of the easiest algorithm in machine learning. It is a statistical model that attempts to

show the relationship between two variables

with a linear equation. But before we drill down to linear regression

algorithm in depth, I'll give you a quick overview

of today's agenda. So we'll start a session with a quick overview

of what is regression as linear regression

is one of a type of regression algorithm.

Once we learn about regression, its use case the various

types of it next. We'll learn about

the algorithm from scratch. Each where I'll teach you it's mathematical

implementation first, then we'll drill down

to the coding part and Implement linear

regression using python in today's session will deal with linear regression algorithm

using least Square method check its goodness of fit or how close the data is to the fitted regression line

using the R square method. And then finally what will do will optimize it

using the gradient decent method in the last part

on the coding session. I'll teach you to implement

linear regression using Python and Coding session would be divided into two parts

the first part would consist of linear regression

using python from scratch where you will use

the mathematical algorithm that you have learned

in this session.

And in the next part

of the coding session will be using scikit-learn

for direct implementation of linear regression. So let's begin our session

with what is regression. Well regression analysis is a form of predictive

modeling technique which investigates the

relationship between a dependent and independent variable

a regression analysis. Vols graphing a line

over a set of data points that most closely fits

the overall shape of the data or regression shows the changes

in a dependent variable on the y-axis to the changes in the explanation variable

on the x-axis fine. Now you would ask

what are the uses of regression? Well, there are major three uses

of regression analysis the first being determining the strength

of predicates errs, the regression might be used

to identify the strength of the effect that the independent variables

have on the dependent variable or But you can ask question.

Like what is the strength

of relationship between sales and marketing spending or what

is the relationship between age and income second is forecasting an effect in this the regression

can be used to forecast effects or impact of changes. That is the regression analysis

help us to understand how much the dependent variable

changes with the change and one or more

independent variable fine. For example, you can ask

question like how much additional say Lancome

will I get for each? Thousand dollars

spent on marketing. So it is Trend forecasting in this the regression

analysis predict Trends and future values.

The regression analysis can

be used to get Point estimates in this you can ask questions. Like what will be

the price of Bitcoin and next six months, right? So next topic is linear versus

logistic regression by now. I hope that you know,

what a regression is. So let's move on

and understand its type. So there are various kinds

of regression like linear regression logistic regression

polynomial regression. Others only but for this session will be focusing on linear

and logistic regression. So let's move on and let me tell

you what is linear regression. And what is logistic regression then what we'll do

we'll compare both of them. All right. So starting with

linear regression in simple linear regression, we are interested in things

like y equal MX plus C.

So what we are trying to find

is the correlation between X and Y variable this means that every value of x has a corresponding

value of y and it if it is continuous. All right, however

in logistic regression, we are not fitting our data

to a straight line like linear regression instead

what we are doing. We are mapping Y versus X to a sigmoid function

in logistic regression. What we find out is is y 1 or 0

for this particular value of x so thus we are essentially

deciding true or false value for a given value of x fine.

So as a core concept

of linear regression, you can say that the data

is modeled using a straight. But in the case

of logistic regression the data is module using

a sigmoid function. The linear regression is used

with continuous variables on the other hand

the logistic regression. It is used with categorical

variable the output or the prediction

of a linear regression is a value of the variable on the other hand

the output of production of a logistic regression

is the probability of occurrence of the event. Now, how will you

check the accuracy and goodness of fit in case

of linear regression? We are various methods

like measured by loss R square. Are adjusted r squared Etc while in the case

of logistic regression you have accuracy precision

recall F1 score, which is nothing but

the harmonic mean of precision and recall next is Roc curve for determining the probability

threshold for classification or the confusion Matrix Etc. There are many all right. So summarizing the difference between linear and

logistic regression. You can say that the type of function you

are mapping to is the main point of difference between linear

and logistic regression a linear regression model.

The Continuous X2 a continuous

file on the other hand a logistic regression

Maps a continuous x to the bindery why so we can use logistic

regression to make category or true false decisions

from the data find so let's move

on ahead next is linear regression selection criteria, or you can say when will

you use linear regression? So the first is classification and regression capabilities

regression models predict a continuous variable such as

the sales made on a day or predict the temperature of a city T their Reliance

on a polynomial like a straight line

to fit a data set poses a real challenge when it comes towards building

a classification capability. Let's imagine that you fit

a line with a train points that you have now imagine you

add some more data points to it. But in order to fit it,

what do you have to do? You have to change

your existing model that is maybe you have

to change the threshold itself.

So this will happen

with each new data point you are to the model hence. The linear regression is not

good for classification models. Fine. Next is data quality. Each missing value

removes one data point that could optimize the regression and

simple linear regression. The outliers can significantly disrupt the outcome

just for now. You can know that if you

remove the outliers your model will become very good. All right. So this is about data quality. Next is computational complexity

a linear regression is often not computationally expensive as

compared to the decision tree or the clustering algorithm

the order of complexity for n training example

and X features usually Falls in either Big O of x Or bigger of xn

next is comprehensible and transparent the

linear regression are easily comprehensible

and transparent in nature. They can be represented by

a simple mathematical notation to anyone and can be

understood very easily.

So these are some

of the criteria based on which you will select

the linear regression algorithm. All right. Next is where is linear

regression used first is evaluating trans

and sales estimate. Well linear regression

can be used in business to evaluate Trends

and make estimates. Forecast for example, if a company sales have

increased steadily every month for past few years then

conducting a linear analysis on the sales data

with monthly sales on the y axis and time on the x axis.

This will give you a line that predicts the upward Trends

in the sale after creating the trendline the company

could use the slope of the lines to focused

sale in future months. Next is analyzing. The impact of price changes

will linear regression can be used to analyze

the effect of pricing on Omer behavior for instance, if a company changes the price on a certain

product several times, then it can record the quantity

itself for each price level and then perform

a linear regression with sold quantity as a dependent variable and price

as the independent variable.

This would result in a line

that depicts the extent to which the customer reduce

their consumption of the product as the prices increasing. So this result would help us

in future pricing decisions. Next is assessment of risk

in financial services and insurance domain. Linear regression can be used

to analyze the risk, for example health insurance

company might conduct a linear regression algorithm how it can do it can do it

by plotting the number of claims per customer against its age

and they might discover that the old customers tend to make more

health insurance claim. Well the result

of such analysis might guide important business decisions. All right, so by now you

have just a rough idea of what linear regression

algorithm as like what it does where it is used

when You should use it early. Now. Let's move on

and understand the algorithm and depth so suppose

you have independent variable on the x-axis and dependent

variable on the y-axis.

All right suppose. This is the data point

on the x axis. The independent variable

is increasing on the x-axis. And so does the dependent

variable on the y-axis? So what kind of linear

regression line you would get you would get a positive

linear regression line. All right as the slope would

be positive next is suppose. You have an independent

variable on the X axis which is increasing and on the other hand the

dependent variable on the y-axis that is decreasing. So what kind of line

will you get in that case? You will get

a negative regression line.

In this case as the slope

of the line is negative and this particular line that is line of y equal MX plus C is a line

of linear regression which shows the relationship

between independent variable and dependent variable and this line is only known

as line of linear regression. Okay. So let's add

some data points, too. Our graph so these

are some observation or data points on our graph. So let's plot some more. Okay. Now all our data points

are plotted now our task is to create a regression line

or the best fit line. All right now once our regression

line is drawn now, it's the task

of production now suppose. This is our estimated value

or the predicted value and this is our actual value. Okay. So what we have to do our main

goal is to reduce this error that is to reduce the distance between the Estimated

or the predicted value and the actual value the best

fit line would be the one which had the least error or the least difference

in estimated value and the actual value.

All right, and other words we

have to minimize the error. This was a brief understanding of linear regression

algorithm soon. We'll jump towards

mathematical implementation. But for then let me tell you

this suppose you draw a graph with speed on the x-axis and distance covered on the y axis with

the time domain in constant. If you plot a graph

between the speed travel by the vehicle and the distance traveled

in a fixed unit of time, then you will get

a positive relationship.

All right. So suppose the equation

of a line is y equal MX plus C. Then in this case Y is

the distance traveled in a fixed duration of time x is the speed of vehicle m

is the positive slope of the line and see is

the y-intercept of the line. All right suppose

the distance remaining constant. You have to plot a graph

between the speed of the vehicle and the time taken

to travel a fixed distance. Then in that case

you will get a line with a negative relationship. All right, the slope of the line

is negative here the equation of line changes to y

equal minus of MX plus C where Y is the time

taken to travel a fixed distance X is the speed of vehicle m is

the negative slope of the line and see is

the y-intercept of the line.

All right. Now, let's get back to our independent

and dependent variable. So in that term,

why is our dependent variable and X that is

our independent variable now, let's move on. And see them at the magical

implementation of the things. Alright, so we have x equal 1 2 3 4 5 let's plot

them on the x-axis. So 0 1 2 3 4 5 6 align

and we have y as 3 4 2 4 5. All right. So let's plot 1 2 3 4 5 on the y-axis now, let's plot our coordinates 1

by 1 so x equal 1 and y equal 3, so we have here x equal 1 and y equal 3 So this is the point

1 comma 3 so similarly we have 1 3 2 4 3 2 4 4 & 5 5. Alright, so moving on ahead. Let's calculate the mean of X

and Y and plot it on the graph. All right, so mean of X is 1 plus 2 plus 3 plus 4

plus 5 divided by 5. That is 3. All right, similarly mean

of Y is 3 plus 4 plus 2 plus 4 plus 5 that is 18. So we 10 divided by 5. That is nothing but 3.6. Alright, so next

what we'll do we'll plot. I mean that is 3 comma

3 .6 on the graph. Okay. So there's a point 3 comma 3 .6 see our goal is to find

or predict the best fit line using the least Square

Method All right.

So in order to find that we first need to find

the equation of line, so let's find the equation

of our regression line. Alright, so let's suppose

this is our regression line y equal MX plus C. Now. We have an equation of line. So all we need to do

is find the value of M and C. I wear m equals

summation of x minus X bar X Y minus y bar

upon the summation of x minus X bar whole Square

don't get confused. Let me resolve it for you. All right. So moving on ahead

as a part of formula. What we are going to do

will calculate x minus X bar. So we have X as 1 minus X bar

as 3 so 1 minus 3 that is minus 2 next. We have x equal

to minus its mean 3 that is minus 1

similarly we 3 – 3 0 4 minus 3 1 5 – 3 2. All right, so x minus X bar.

It's nothing but the distance

of all the point through the line y equal 3 and what does this y minus y bar implies

it implies the distance of all the point from the line x equal 3 .6 fine. So let's calculate the value

of y minus y bar. So starting with y equal 3 – value of y bar

that is 3.6. So it is three minus three. .6. How much – of 0.6 next is 4 minus 3.6

that is 0.4 next to minus 3.6 that is – of 1.6. Next is 4 minus 3.6

that is 0.4 again, 5 minus 3.6 that is 1.4. Alright, so now we are done

with Y minus y bar fine now next we will calculate x

minus X bar whole Square. So let's calculate x

minus X bar whole Square so it is – 2 whole square that is

4 minus 1 whole square. That is 1 0 squared is

0 1 Square 1 2 square for fine. So now in our table we have x

minus X bar y minus y bar and x minus X bar whole Square.

Now what we need. We need the product of x

minus X bar X Y minus y bar. Alright, so let's see

the product of x minus X bar X Y minus y bar that is minus

of 2 x minus of 0.6. That is 1.2 minus

of 1 x 0 point 4. That is minus. – of zero point 4 0 x

minus of 1.6. That is 0 1 multiplied

by zero point four that is 0.4. And next 2 multiplied

by 1 point for that is 2.8. All right. Now almost all the parts

of our formula is done. So now what we need

to do is get the summation of last two columns. All right, so the summation of x

minus X bar whole square is 10 and the summation of x minus X bar X Y minus y bar is for So the value of M

will be equal to 4 by 10 fine. So let's put this value of m equals zero point 4

and our line y equal MX plus C. So let's file all the points

into the equation and find the value of C.

So we have y as 3.6 remember

the mean by m as 0.4 which we calculated just

now X as the mean value of x that is 3 and we have the equation as

3 point 6 equals 0 .4 Applied by 3 plus C. Alright that is 3.6 equal

1 Point 2 plus C. So what is the value of C

that is 3.6 minus 1.2. That is 2.4. All right. So what we had we had m equals

zero point four C as 2.4. And then finally when we calculate the equation

of the regression line, what we get is y equal

zero point four times of X plus two point four. So this is the regression line. All right, so there is how you are plotting

your points this Actual point. All right now for given m equals

zero point four and SQL 2.4. Let's predict the value of y

for x equal 1 2 3 4 & 5. So when x equal

1 the predicted value of y will be zero point four x one plus two point

four that is 2.8. Similarly when x equal

to predicted value of y will be zero point 4 x 2 + 2 point 4 that equals

to 3 point 2 similarly x equal 3 y will be 3.

.6. X equals 4 y will be 4 point 0 x equal 5 y will be

four point four. So let's plot them on the graph and the line passing through

all these predicting point and cutting y-axis at 2.4

as the line of regression. Now your task is to calculate

the distance between the actual and the predicted value and your job is

to reduce the distance. All right, or in other words, you have to reduce the error

between the actual and the predicted value the line with the least error will be

the line of linear regression. Chicken or regression line and it will also be

the best fit line. All right. So this is how things

work in computer. So what it do it performs

n number of iteration for different values of M

for different values of M. It will calculate

the equation of line where y equals MX plus C. Right? So as the value

of M changes the line is changing so iteration

will start from one.

All right, and it will perform

a number of iteration. So after every iteration what it will do it

will calculate the predicted. Value according to the line

and compare the distance of actual value

to the predicted value and the value of M for which the distance

between the actual and the predicted value is

minimum will be selected as the best fit line. All right. Now that we have calculated

the best fit line now, it's time to check the goodness

of fit or to check how good a model is performing. So in order to do that, we have a method

called R square method. So what is this R square? Well r-squared value is

a statistical measure of how close the data are to the fitted regression

line in general.

It is considered that a high r-squared

value model is a good model, but you can also have

a lower squared value for a good model as well or a higher squared

value for a model that does not fit at all. I like it is also known as

coefficient of determination or the coefficient

of multiple determination. Let's move on and see

how a square is calculated.

So these are our actual values

plotted on the graph. We had calculated

the predicted values of Y as 2.8 3.2 3.6 4.0 4.4. Remember when we calculated

the predicted values of Y for the equation Y

predicted equals 0 1 4 x of X plus two point

four for every x equal 1 2 3 4 & 5 from there. We got the Ed values

of Phi all right. So let's plot it on the graph.

So these are point

and the line passing through these points are nothing

but the regression line. All right. Now what you need to do is you have to check and compare

the distance of actual – mean versus the distance

of predicted – mean alike. So basically what you are doing

you are calculating the distance of actual value to the mean

to distance of predicted value to the mean I like so there is nothing but a square in mathematically

you can represent our school. Whereas summation of Y

predicted values minus y bar whole Square divided

by summation of Y minus y bar whole Square where Y is the actual value

y p is the predicted value and Y Bar is the mean value of y

that is nothing but 3.6. Remember, this is our formula. So next what we'll do

we'll calculate y minus. Y1. So we have y is

3y bar as 3 point 6, so we'll calculate

it as 3 minus 3.6 that is nothing but minus of 0.6

similarly for y equal 4 and Y Bar equal 3.6.

We have y minus y bar as

zero point 4 then 2 minus 3.6. It is 1 point 6 4 minus

3.6 again zero point four and five minus 3.6 it is 1.4. So we got the value

of y minus y bar. Now what we have to do we

have to take it Square. So we have minus 0.6 Square

as 0.36 0.4 Square as 0.16 – of 1.6 Square as 2.56 0.4 Square

as 0.16 and 1.4 squared is 1.96 now is a part

of formula what we need. We need our YP

minus y BAR value. So these are VIP values and we

have to subtract it from the No, why so 2 .8 minus 3.6

that is minus 0.8. Similarly. We will get 3.2 minus 3.6

that is 0.4 and 3.6 minus 3.6. That is 0 for 1 0 minus

3.6 that is 0.4. Then 4 .4 minus 3.6 that is 0.8. So we calculated the value

of YP minus y bar now, it's our turn to calculate

the value of y b minus y bar whole Square next. We have – of 0.8 Square as 0.64 – of Point

four square as 0.160 Square 0 0 point 4 Square as again 0.16

and 0.8 Square as 0.64.

All right. Now as a part of formula what it suggests it suggests

me to take the summation of Y P minus y bar whole square and summation of Y minus

y bar whole Square. All right. Let's see. So in submitting y

minus y bar whole Square what you get is five point two

and summation of Y P minus y bar whole Square you

get one point six. So the value of R square

can be calculated as 1 point 6 upon 5.2 fine. So the result which will get

is approximately equal to 0.3.

Well, this is not a good fit. All right, so it suggests that the data points are far

away from the regression line. Alright, so this is how your graph will look

like when R square is 0.3 when you increase the value

of R square to 0.7. So you'll see that the actual value would like

closer to the regression line when it reaches to 0.9 it comes. More clothes and when the value

of approximately equals to 1 then the actual values lies

on the regression line itself, for example, in this case. If you get a very low value

of R square suppose 0.02. So in that case what will see that the actual values

are very far away from the regression line

or you can say that there are too

many outliers in your data. You cannot focus

and thing from the data. All right. So this was all about the

calculation of our Square now, you might get a question

like are low values of Square always bad. Well in some field it

is entirely expected that I ask where value will be low.

For example any field that attempts to predict human

behavior such as psychology typically has r-squared values

lower than around 50% through which you can conclude that humans are simply harder to predict the under

physical process furthermore. If you ask what value is low, but you have statistically

significant predictors, then you can still

draw important conclusion about how changes in the

predicator values associated. Created with the changes

in the response value regardless of the r-squared the significant coefficient

still represent the mean change in the response for one unit

of change in the predicator while holding other predicated

in the model constant. Obviously this type of information can be

extremely valuable.

All right. All right. So this was all about

the theoretical concept now, let's move on to the coding part and understand the

code in depth. So for implementing

linear regression using python, I will be using Anaconda with jupyter notebook

installed on it. So I like there's

a jupyter notebook and we are using

python 3.0 on it. Alright, so we are going

to use a data set consisting of head size and human brain

of different people.

All right. So let's import our data set

percent matplotlib and line. We are importing numpy as NP pandas as speedy and

matplotlib and from matplotlib. We are importing pipe lot

of that as PLT. Alright next we will import

our data had brain dot CSV and store it

in the database table. Let's execute the Run button

and see the armor. But so this task

symbol it symbolizes that it still executing. So there's a output

our data set consists of two thirty seven rows

and 4 columns. We have columns as

gender age range head size in centimeter Cube and brain weights

and Graham fine. So there's our sample data set. This is how it looks it consists

of all these data set. So now that we

have imported our data, so as you can see they are

237 values in the training set so we can find a linear. Relationship between the head

size and the Brain weights.

So now what we'll do

we'll collect X & Y the X would consist

of the head size values and the Y would consist

of brain with values. So collecting X and Y.

Let's execute the Run. Done next what we'll do we

need to find the values of b 1 or B not or you can say m and C. So we'll need the mean of X

and Y values first of all what we'll do we'll calculate

the mean of X and Y so mean x equal NP dot Min X. So mean is a predefined function

of Numb by similarly mean underscore y equal

NP dot mean of Y, so what it will return if you'll return

the mean values of Y next we'll check

the total number of values.

So m equals. Well length of X. Alright, then we'll use the formula

to calculate the values of b 1 and B naught or MNC. All right, let's execute

the Run button and see what is the result. So as you can see

here on the screen, we have got d 1 as 0 point 2 6 3 and be not as three twenty

five point five seven. Alright, so now

that we have a coefficient. So comparing it with

the equation y equal MX plus C. You can say

that brain weight equals zero point 2 6 3 X Head size

plus three twenty five point five seven so you can say that the value of M

here is zero point 2 6 3 and the value of C. Here is three twenty

five point five seven. All right, so there's

our linear model now, let's plot it

and see graphically. Let's execute it. So this is how our plot looks

like this model is not so bad. But we need to find out

how good our model has.

So in order to find

it the many methods like root mean Square method

the coefficient of determination or the a square method. So in this tutorial, I have told you

about our score method. So let's focus on that and see

how good our model is. So let's calculate

the R square value. All right here SS underscore T

is the total sum of square SS.

I is the total sum of square

of residuals and R square as the formula is

1 minus total sum of squares upon total sum

of square of the residuals. All right next

when you execute it, you will get the value

of R square as 0.63 which is pretty very good. Now that you have implemented

simple linear regression model using least Square method, let's move on and see how will you implement the model

using machine learning library called scikit-learn. All right. So this scikit-learn

is a simple machine. Owning library in Python welding

machine learning model are very easy using scikit-learn.

So suppose there's

a python code. So using the scikit-learn

libraries your code shortens to this length like so let's execute the Run button and see you

will get the same our to score. So today we'll be discussing

logistic regression. So let's move forward and understand the what and by

of logistic regression. Now this algorithm

is most widely used when the dependent variable or you can see the output is

in the binary format. And so here you need

to predict the outcome of a categorical

dependent variable.

So the outcome should be

always discreet or categorical in nature Now by discrete. I mean the value

should be binary or you can say you just have

two values it can either be 0 or 1 you can either be yes or a no either be true

or false or high or low. So only these can be

the outcomes so the value which you need to protect

should be discrete or you can say

categorical in nature. Whereas in linear regression. We have the value of by

or you can say the value. Two predictors in a Range that is how there's a difference

between linear regression and logistic regression. We must be having question. Why not linear regression now

guys in linear regression the value of buyer or the value which you need

to predict is in a range, but in our case as

in the logistic regression, we just have two values

it can be either 0 or it can be one.

It should not entertain

the values which is below zero or above one. But in linear regression, we have the value of y

in the range so here in order to implement logic regression. We need to clip this This part

so we don't need the value that is below zero

or we don't need the value which is above 1 so since the value of y will be

between only 0 and 1 that is the main rule

of logistic regression. The linear line has to be

clipped at zero and one now. Once we clip this graph it

would look somewhat like this.

So here you are

getting the curve which is nothing but

three different straight lines. So here we need to make

a new way to solve this problem. So this has to be

formulated into equation and hence we come up

with logistic regression. So here the outcome

is either 0 or 1. Which is the main rule

of logistic regression. So with this our resulting curve

cannot be formulated. So hence our main aim

to bring the values to 0 and 1 is fulfilled. So that is how we came up with

large stick regression now here once it gets formulated

into an equation. It looks somewhat like this. So guys, this is

nothing but an S curve or you can say the sigmoid curve

a sigmoid function curve. So this sigmoid function

basically converts any value from minus infinity to Infinity

pure discrete values, which a Logitech regression

wants or you can say the Values which are in binary

format either 0 or 1.

So if you see here

the values and either 0 or 1 and this is nothing

but just a transition of it, but guys there's

a catch over here. So let's say I have

a data point that is 0.8. Now, how can you decide whether your value is 0 or 1 now here you

have the concept of threshold which basically

divides your line. So here threshold value basically indicates the

probability of either winning or losing so here by winning. I mean the value is equals to 1. Am I losing I mean

the values equal to 0 but how does it do that? Let's have a data point

which is over here.

Let's say my cursor is at 0.8. So here I check whether this value is less

than the threshold value or not. Let's say if it is more

than the threshold value. It should give me the result

as 1 if it is less than that, then should give me

the result is zero. So here my threshold

value is 0.5. I need to Define that

if my value let's is 0.8. It is a more than 0.5. Then the value should

be rounded of to 1.

Let's see if it is

less than 0.5. Let's I have a value 0.2 then

should reduce it to zero. So here you can use the concept of threshold value

to find output. So here it should be discreet. It should be either 0

or it should be one. So I hope you caught this curve

of logistic regression. So guys, this is

the sigmoid S curve. So to make this curve

we need to make an equation. So let me address

that part as well. So let's see how an equation

is formed to imitate this functionality so over here, we have an equation

of a straight line. It is y is equal to MX plus C. So in this case, I just have only one independent

variable but let's say if we have many

independent variable then the equation becomes m 1 x

1 plus m 2 x 2 plus m 3 x 3 and so on till M NX n now, let us put in B and X.

So here the equation

becomes Y is equal to b 1 x 1 plus beta 2 x 2 plus b 3 x 3 and so on

till be nxn plus C. So guys the equation of the straight line has a range

from minus infinity to Infinity. But in our case or you can say

largest equation the value which we need to predict

or you can say the Y value it can have the range

only from 0 to 1. So in that case we need

to transform this equation. So to do that what we had done we have just divide

the equation by 1 minus y so now Y is equal to 0 so 0 over 1 minus 0

which is equal to 1 so 0 over 1 is again 0 and if you take Y is equals

to 1 then 1 over 1 minus 1 which is 0 so 1 over 0 is infinity. So here my range is now

between You know to Infinity, but again, we want the range

from minus infinity to Infinity. So for that what we'll do we'll have

the log of this equation. So let's go ahead

and have the logarithmic of this equation. So here we have this transform

it further to get the range between minus infinity

to Infinity so over here we have log of Y over 1 minus 1 and this is your final

logistic regression equation.

So guys, don't worry. You don't have to write

this formula or memorize this formula in Python. You just need to

call this function which is logistic regression and everything will be be

automatically for you. So I don't want to scare

you with the maths in the formulas behind it, which is always good to know

how this formula was generated. So I hope you guys are clear with how logistic regression

comes into the picture next. Let us see what are

the major differences between linear regression was

a logistic regression the first of all in linear regression, we have the value of y as a continuous variable

or the variable between need to predict

are continuous in nature. Whereas in logistic regression. We have the categorical variable

so here the value which you need to predict

should be Creating nature. It should be either 0 or 1 or should have

just two values to it. For example, whether it is raining

or it is not raining is it humid outside

or it is not humid outside.

Now, does it going to snow

and it's not going to snow? So these are the few example, we need to predict where the values are discrete

or you can just predict whether this is

happening or not. Next linear equation solves

your regression problems. So here you have a concept

of independent variable and the dependent variable. So here you can calculate

the value of y which you need to predict

using the A of X so here your y variable

or you can see the value that you need to

predict are in a range.

But whereas in

logistic regression you have discrete values. So logistic regression basically

solves a classification problem so it can basically classify it

and it can just give you result whether this event

is happening or not. So I hope it is pretty much Clear till now

next in linear equation. The graph that you have seen is a straight line graph

so over here, you can calculate the value

of y with respect to the value of x where as in logistic

regression because of that. The got was a Escobar you

can see the sigmoid curve. So using the sigmoid function

You can predict your y-values moving the I let

us see the various use cases where in logistic regression

is implemented in real life. So the very first is

weather prediction now largest aggression helps

you to predict your weather.

For example, it

is used to predict whether it is raining

or not whether it is sunny. Is it cloudy or not? So all these things

can be predicted using logistic regression. Where as you need

to keep in mind that both linear regression. And logistic regression can be

used in predicting the weather. So in that case linear equation

helps you to predict what will be

the temperature tomorrow whereas logistic regression

will only tell you which is going to rain or not

or whether it's cloudy or not, which is going to snow or not. So these values are discrete. Whereas if you apply

linear regression you the predicting things like what

is the temperature tomorrow or what is the temperature

day after tomorrow and all those thing? So these are

the slight differences between linear regression and logistic regression

the moving ahead. We have classification problem. Sighs on performs

multi-class classification. So here it can help you tell

whether it's a bird. It's not a bird. Then you classify

different kind of mammals.

Let's say whether it's a dog

or it's not a dog similarly. You can check it for reptile whether it's a reptile

or not a reptile. So in logistic regression, it can perform

multi-class classification. So this point

I've already discussed that it is used

in classification problems next. It also helps you to determine

the illness as well. So let me take an example. Let's say a patient goes for

a routine check up in hospital. So what doctor will do it, it will perform various tests

on the patient and will check whether the patient is

actually l or not. So what will be the features so doctor can check

the sugar level the blood pressure then what

is the age of the patient? Is it very small or is

it old person then? What is the previous medical

history of the patient and all of these features

will be recorded by the doctor and finally doctor checks

the patient data and Data – the outcome of an illness

and the severity of illness.

So using all the data

a doctor can identify with A patient is ill or not. So these are

the various use cases in which you can use

logistic regression now, I guess enough of theory part. So let's move ahead and see some

of the Practical implementation of logistic regression

so over here, I be implementing two projects when I have the data set of Titanic so over here

will predict what factors made people more likely to survive

the sinking of the Titanic ship and my second project

will see the data analysis on the SUV cars so over here we

have the data of the SUV cars who can purchase it. And what factors made people

more interested in buying SUV? So these will be

the major questions as to why you should Implement logistic regression and

what output will you get by it? So let's start by

the very first project that is Titanic data analysis.

So some of you might know that there was a ship

called as Titanic with basically hit an iceberg and it sunk to the bottom

of the ocean and it was a big disaster at that time because it was the first

voyage of the ship and it was supposed to be really

really strongly built and one of the best ships of that time.

So it was a big Disaster of that time and of course there

is a movie about this as well. So many of you

might have washed it. So what we have we have data

of the passengers those who survived and those who did not survive

in this particular tragedy. So what you have to do you

have to look at this data and analyze which factors

would have been contributed the most to the chances of a person survival

on the ship or not. So using the logistic

regression, we can predict whether the person survived or the person died

now apart from this. We also have a look

with the various features along with that. So first, let us explore

The data set so over here. We have the index value

then the First Column is passenger ID. Then my next column is survived. So over here, we have two values

a 0 and a 1 so 0 stands for did not survive

and one stands for survive.

So this column is categorical where the values

are discrete next. We have passenger class

so over here, we have three values 1 2 and 3. So this basically tells you that whether a passengers travelling

in the first class second class or third class, then we have the name of the We have the six or you can see

the gender of the passenger where the passenger

is a male or female. Then we have the age

we had sip SP. So this basically means

the number of siblings or the spouses aboard

the Titanic so over here, we have values such as 1

0 and so on then we have Parts apart is basically

the number of parents or children aboard

the Titanic so over here, we also have some values then we have the ticket number. We have the fair. We have the table number

and we have the embarked column.

So in my inbox column, we have three values

we have SC and Q. So as basically stands for Southampton C

stands for Cherbourg and Q stands for Cubans down. So these are the features that will be applying

our model on so here we'll perform various steps and then we'll be implementing

logistic regression. So now these are

the various steps which are required

to implement any algorithm. So now in our case

we are implementing logistic regression soft. Very first step is

to collect your data or to import the libraries that are used for

collecting your data. And then taking it forward then

my second step is to analyze your data so over here I can go

to the various fields and then I can analyze the data. I can check that the females or children survive

better than the males or did the rich

passenger survived more than the poor passenger

or did the money matter as in who paid mode to get

into the ship with the evacuated first? And what about the workers

does the worker survived or what is the survival rate if you were the worker

in the ship and not just a traveling passenger? So all of these are

very very and questions and you would be going

through all of them one by one.

So in this stage, you need to analyze our data and explore your data as much

as you can then my third step is to Wrangle your data now data wrangling basically means

cleaning your data so over here, you can simply remove

the unnecessary items or if you have a null values

in the data set. You can just clear that data and

then you can take it forward. So in this step you

can build your model using the train data set and then you can test it using a test so over here you

will be performing a split which basically Get

your data set into training and testing data set and find

you will check the accuracy. So as to ensure how much accurate

your values are. So I hope you guys got

these five steps that you're going to implement

in logistic regression.

So now let's go into all

these steps in detail. So number one. We have to collect your data or you can say

import the libraries. So it may show you

the implementation part as well. So I just open

my jupyter notebook and I just Implement all

of these steps side by side. So guys this is

my jupyter notebook. So first, let me just rename

jupyter notebook to let's say Titanic data analysis. Now a full step was

to import all the libraries and collect the data. So let me just import

all the library's first. So first of all,

I'll import pandas. So pandas is used

for data analysis. So I'll say import pandas as PD

then I will be importing numpy. So I'll say import numpy as NP

so number is a library in Python which basically stands

for numerical Python and it is widely used to perform

any scientific computation. Next. We will be importing Seaborn. So c 1 is a library for statistical plotting so

Say import Seaborn as SNS. I'll also import matplotlib. So matplotlib library

is again for plotting.

So I'll say import

matplotlib dot Pi plot as PLT now to run this library

in jupyter Notebook all I have to write in his percentage

matplotlib in line. Next I will be importing

one module as well. So as to calculate the basic

mathematical functions, so I'll say import maths. So these are the libraries that I will be needing

in this Titanic data analysis.

So now let me just

import my data set. So I'll take a variable. Let's say Titanic data

and using the pandas. I will just read my CSV

or you can see the data set. I like the name of my data set

that is Titanic dot CSV. Now. I have already showed you

the data set so over here. Let me just bring

the top 10 rows. So for that I will just say I take the variable

Titanic data dot head and I'll say the top ten rules. So now I'll just run this so to run this style is have

to press shift + enter or else you can just directly

click on this cell so over here. I have the index. We have the passenger ID,

which is nothing. But again the index which is starting from 1 then

we have the survived column which has a category. Call values or you can say

the discrete values, which is in the form of 0 or 1.

Then we have

the passenger class. We have the name of

the passenger sex age and so on. So this is the data set that I will be going forward with next let us print

the number of passengers which are there in this original

data frame for that. I'll just simply type in print. I'll say a number of passengers. And using the length function, I can calculate

the total length. So I'll say length and inside this I'll be

passing this variable because Titanic data,

so I'll just copy it from here. I'll just paste it dot index and next set me

just bring this one. So here the number of passengers which are there in the original data set

we have is 891 so around this number would traveling in

the Titanic ship so over here, my first step is done where you have just collected

data imported all the libraries and find out the total

number of passengers, which are Titanic so

now let me just go back to presentation and let's see.

What is my next step. So we're done with

the collecting data. Next step is to analyze

your data so over here will be creating different plots

to check the relationship between variables as in how one variable

is affecting the other so you can simply explore

your data set by making use of various columns and then you can plot

a graph between them. So you can either plot

a correlation graph. You can plot

a distribution curve. It's up to you guys. So let me just go back to my jupyter notebook and let

me analyze some of the data. Over here. My second part is

to analyze data. So I just put this in headed

to now to put this in here to I just have to go

on code click on mark down and I just run this so first let us plot account plot where you can pay

between the passengers who survived and

who did not survive. So for that I will be using

the Seabourn Library so over here I have imported

Seaborn as SNS so I don't have

to write the whole name.

I'll simply say

SNS dot count plot. I say axis with the survive

and the data that I'll be using

is the Titanic data or you can say the name

of variable in which you have store your data set. So now let me just run this so who were here as you can see

I have survived column on my x axis and on the y axis. I have the count. So zero basically stands

for did not survive and one stands

for the passengers who did survive so over here, you can see that around 550

of the passengers who did not survive and they

were around 350 passengers who only survive so here

you can basically conclude. There are very less survivors

than on survivors. So this was the very first plot

now there is not another plot to compare the sex as to whether

out of all the passengers who survived and

who did not survive.

How many were men and

how many were female so to do that? I'll simply say

SNS dot count plot. I add the Hue as six

so I want to know how many females and

how many male survive then I'll be

specifying the data. So I'm using Titanic data

set and let me just run this you have done a mistake over here so over here you can see I have survived

column on the x-axis and I have the count

on the why now. So have you color stands

for your male passengers and orange stands

for your female? So as you can see

here the passengers who did not survive that has a value

0 so we can see that. Majority of males did not

survive and if we see the people who survived here, we can see the majority

of female survive. So this basically concludes

the gender of the survival rate. So it appears on average

women were more than three times more likely

to survive than men next.

Let us plot another plot where we have the Hue as

the passenger class so over here we can see which class at

the passenger was traveling in whether it was traveling

in class 1 2 or 3. So for that I just

arrived the same command. I will say as soon as.com plot. I gave my x-axis as a family. I'll change my Hue

to passenger class. So my variable

named as PE class.

And the data said that I'll be using

this Titanic data. So this is my result so over here you can see I have

blue for first-class orange for second class and green

for the third class. So here the passengers who did not survive a majorly

of the third class or you can say the lowest class or the cheapest class to get

into the dynamic and the people who did survive majorly belong

to the higher classes. So here 1 & 2 has more eyes

than the passenger who were traveling

in the third class. So here we have computed

that the passengers who did not survive

a majorly of third. Or you can see the lowest class and the passengers

who were traveling in first and second class

would tend to survive mode next. I just got a graph for

the age distribution over here. I can simply use my data. So we'll be using

pandas library for this. I will declare an array

and I'll pass in the column. That is H. So I plot and I

want a histogram.

So I'll see plot da test. So you can notice over here that we have more

of young passengers, or you can see the children

between the ages 0 to 10 and then we have

the average people and if you go ahead Lester

would be the population. So this is the analysis

on the age column. So we saw that we have more young passengers and

more video courage passengers which are traveling

in the Titanic. So next let me plot

a graph of fare as well. So I'll say Titanic data.

I say fair and again, I've got a histogram

so I'll say hissed. So here you can see

the fair size is between zero to hundred now. Let me add the bin size. So as to make it

more clear over here, I'll say Ben is equals to let's say 20 and I'll increase

the figure size as well. So I'll say fixed size. Let's say I'll give

the dimensions as 10 by 5. So it is bins.

So this is more clear now next. It is analyzed

the other columns as well. So I'll just type

in Titanic data and I want the information as

to what all columns are left. So here we have passenger ID, which I guess it's

of no use then we have see how many passengers survived

and how many did not we also do the analysis

on the gender basis. We saw with a female

tend to survive more or the maintain to survive more

then we saw the passenger class where the passenger is traveling

in the first class second class or third class. Then we have the name. So in name,

we cannot do any analysis. We saw the sex we saw the ages. Well, then we have sea bass P. So this stands for the number

of siblings or the spouse is which Are aboard the Titanic so

let us do this as well.

So I'll say SNS dot count plot. I mentioned X SC SP. And I will be using

the Titanic data so you can see the plot over here so over here

you can conclude that. It has the maximum value

on zero so we can conclude that neither children

nor a spouse was on board the Titanic now

second most highest value is 1 and then we have various values

for 2 3 4 and so on next if I go above the store

this column as well. Similarly can do four parts. So next we have part so you can see the number

of parents or children which are both the Titanic

so similarly can do. Israel then we have

the ticket number.

So I don't think so. Any analysis is

required for Ticket. Then we have fears of a we

have already discussed as in the people would tend

to travel in the first class. You will pay the highest view

then we have the cable number and we have embarked. So these are the columns that will be doing

data wrangling on so we have analyzed the data and we have seen

quite a few graphs in which we can conclude which

variable is better than another or what is the relationship

the whole third step is my data wrangling

so data wrangling basically means Cleaning your data.

So if you have a large data set, you might be having

some null values or you can say n values. So it's very important that you remove all

the unnecessary items that are present

in your data set. So removing this directly

affects your accuracy. So I just go ahead

and clean my data by removing all the Nan values

and unnecessary columns, which has a null value

in the data set the next time you're

performing data wrangling. Supposed to fall I'll check whether my dataset

is null or not. So I'll say Titanic data, which is the name of my data set

and I'll say is null. So this will basically tell

me what all values are null and will return me

a Boolean result.

So this basically

checks the missing data and your result will be

in Boolean format as in the result will be true

or false so Falls mean if it is not null

and true means if it is null, so let me just run this. Over here you can see

the values as false or true. So Falls is where the value is

not null and true is where the value is none. So over here you can see

in the cabin column. We have the very first value which is null so we have to do

something on this so you can see that we have a large data set. So the counting does not stop and we can actually

see the some of it. We can actually print

the number of passengers who have the Nan value

in each column. So I say Titanic

underscore data is null and I want the sum of it. They've got some so this is

basically print the number of passengers who have the n

n values in each column so we can see that we have missing values

in each column that is 177.

Then we have the maximum value

in the cave in column and we have very Less

in the Embark column. That is 2 so here if you don't want

to see this numbers, you can also plot a heat map and then you can visually

analyze it so let me just do that as well. So I'll say SNSD heat map. and say why tick labels

False child has run this as we have already seen that there were three columns in which missing data

value was present. So this might be age so over

here almost 20% of each column has a missing value then

we have the caping columns. So this is quite a large value and then we have two values

for embark column as well. Add a see map for color coding.

So I'll say see map. So if I do this so the graph becomes

more attractive so over here yellow stands for Drew or you

can say the values are null. So here we have computed that we have the missing value

of H. We have a lot of missing values

in the cabin column and we have very less value, which is not even visible

in the Embark column as well. So to remove

these missing values, you can either replace

the values and you can put in some dummy values to it or you

can simply drop the column.

So here let us suppose

pick the age column. So first, let me

just plot a box plot and they will analyze

with having a column as age so I'll say SNS dot box plot. I'll say x is equals

to passenger class. So it's PE class. I'll say Y is equal

to H and the data set that I'll be using

is Titanic side. So I'll say the data

is goes to Titanic data. You can see the edge in first class and second class

tends to be more older rather than we have it

in the third place.

Well that depends

On the experience how much you earn on might be

there any number of reasons? So here we concluded that passengers who were traveling in class

one and class two a tend to be older than what we have

in the class 3 so we have found that we have some

missing values in EM. Now one way is to either just

drop the column or you can just simply fill

in some values to them. So this method is called

as imputation now to perform data wrangling or cleaning it is for spring

the head of the data set. So I'll say Titanic not head

so it's Titanic. For data, let's say I

just want the five rows. So here we have survived

which is again categorical. So in this particular column, I can apply

logic to progression.

So this can be my y value

or the value that you need to predict. Then we have

the passenger class. We have the name then we

have ticket number Fair given so over here. We have seen that in keeping. We have a lot of null values

or you can say that any invalid which is quite visible as well. So first of all, we'll just drop this column

for dropping it. I'll just say

Titanic underscore data. And I'll simply type

in drop and the column which I need to drop so I

have to drop the cable column. I mention the access equals

to 1 and I'll say in place also to true. So now again, I just print the head

and a to see whether this column has been removed

from the data set or not.

So I'll say Titanic dot head. So as you can see here, we don't have

given column anymore. Now, you can also

drop the na values. So I'll say Titanic data

dot drop all the any values or you can say Nan which is not a number and I will

say in place is equal to True. Let's Titanic. So over here, let me again plot the heat map

and let's say what the values which will be for showing

a lot of null values. Has it been removed or not. So I'll say SNSD heat map. I'll pass in the data set. I'll check it is null I say why

dick labels is equal to false. And I don't want color coding. So again I say false. So this will basically

help me to check whether my values

has been removed from the data set or not.

So as you can see here,

I don't have any null values. So it's entirely black now. You can actually know

the some as well. So I'll just go above So

I'll just copy this part and I just use the sum function

to calculate the sum. So here the tells me that data set is green as

in the data set does not contain any null value or any n value. So now we have R Angela data. You can see cleaner data. So here we have done just

one step in data wrangling that is just removing

one column out of it.

Now you can do a lot

of things you can actually fill in the values

with some other values or you can just

calculate the mean and then you can just fit

in the null values. But now if I see my data set, so I'll say

Titanic data dot head. But now if I see you over here I

have a lot of string values. So this has to be converted

to a categorical variables in order to implement

logistic regression. So what we will do

we will convert this to categorical variable into some dummy variables and

this can be done using pandas because logistic regression

just take two values. So whenever you apply machine

learning you need to make sure that there are

no string values present because it won't be taking

these as your input variables. So using string you don't have

to predict anything but in my case I have the survived

columns 2210 how many? People tend to survive and how men did not so 0 stands

for did not survive and one stands for survive.

So now let me just

convert these variables into dummy variables. So I'll just use pandas

and I say PD not get dummies. You can simply press

tab to autocomplete and say Titanic data

and I'll pass the sex so you can just simply click

on shift + tab to get more information on this. So here we have

the type data frame and we have the passenger ID

survived and passenger class. So if Run this you'll see that 0 basically stands

for not a female and once and for it is a female similarly for

male zero Stanford's not made and one Stanford main now, we don't require

both these columns because one column

itself is enough to tell us whether it's male

or you can say female or not. So let's say if I want

to keep only mail I will say if the value of mail is 1 so it is definitely a maid

and is not a female. So that is how you don't need

both of these values. So for that I just

remove the First Column, let's say a female so

I'll say drop first.

Andrew it has given

me just one column which is male and has

a value 0 and 1. Let me just set this as

a variable hsx so over here I can say sex dot head. I'll just want to see

the first pie Bros. Sorry, it's Dot. So this is how my data

looks like now here. We have done it for sex.

Then we have

the numerical values in age. We have the numerical

values in spouses. Then we have the ticket number. We have the pair and we

have embarked as well. So in Embark,

the values are in SC and Q. So here also we can apply

this get dummy function. So let's say I

will take a variable. Let's say Embark. I'll use the pandas Library. I need the column name

that is embarked.

Let me just print

the head of it. So I'll say Embark

dot head so over here. We have c q and s now here also

we can drop the First Column because these two

values are enough with the passenger

is either traveling for Q that is toonstone S4 sound time and if both the values

are 0 then definitely the passenger is from Cherbourg. That is the third value so you can again drop the first

value so I'll say drop. Let me just run this so this is how my output looks

like now similarly. You can do it for

passenger class as well. So here also we have

three classes one two, and three so I'll just

copy the whole statement. So let's say I want

the variable name. Let's say PCL. I'll pass in the column name that is PE class and I'll just

drop the First Column.

So here also the values

will be 1 2 or 3 and I'll just remove

the First Column. So here we just left

with two and three so if both the values are 0 then

definitely the passengers traveling the first class now, we have made the values

as categorical now, my next step would be

to concatenate all these new rows into a data set. We can see Titanic data using

the pandas will just concatenate all these columns. So I'll say p Dot. One cat and then say

we have to concatenate sex. We have to concatenate

Embark and PCL and then I will mention

the access to one. I'll just run this can you to print the head so

over here you can see that these columns

have been added over here.

So we have the mail column

with basically tells where the person is male or it's a female then

we have the Embark which is basically q and s so if it's traveling from Queenstown value

would be one else it would be 0 and If both

of these values are zeroed, it is definitely

traveling from Cherbourg. Then we have the passenger

class as 2 and 3. So the value of both these is 0 then passengers

travelling in class one. So I hope you got this

till now now these are the irrelevant columns that we have done over here so we can just drop

these columns will drop in PE class the embarked column

and the sex column.

So I'll just type in Titanic data dot drop

and mention the columns that I want to drop. So I say And even lead

the passenger ID because it's nothing

but just the index value which is starting from one. So I'll drop this as well then

I don't want name as well. So I'll delete name as well. Then what else we can drop we

can drop the ticket as well. And then I'll just

mention the axis L say in place is equal to True. Okay, so the my column

name starts uppercase. So these has been dropped now, let me just bring

my data set again. So this is

my final leadership guys. We have the survived column which has the value zero and one

then we have the passenger class or we forgot to drop

this as well. So no worries. I'll drop this again. So now let me just run this. So over here we

have the survive. We have the H we

have the same SP. We have the parts. We have Fair mail and these

we have just converted.

So here we have just

performed data angling for you can see clean the data and then we have just

converted the values of gender to male then embarked to qns and the passenger Class 2 2 & 3. So this was all

about my data wrangling or just cleaning the data then

my next up is training and testing your data. So here we will split

the data set into train subset and test steps. And then what we'll do

we'll build a model on the train data and then predict the output

on your test data set. So let me just go

back to Jupiter and it is implement

this as well over here. I need to train my data set. So I'll just put this

indeed heading 3. So over you need to Define

your dependent variable and independent variable. So here my Y is the output

for you can say the value that I need to predict

so over here, I will write Titanic data. I'll take the column

which is survive. So basically I have

to predict this column whether the passenger

survived or not. And as you can see we have

the discrete outcome, which is in the form of 0

and 1 and rest all the things we can take it as a features or you

can say independent variable.

So I'll say Titanic data. Not a drop, so we just simply

drop the survive and all the other columns

will be my independent variable. So everything else are

the features which leads to the survival rate. So once we have defined

the independent variable and the dependent variable

next step is to split your data into training

and testing subset. So for that we will

be using SK loan. I just type in from sklearn

dot cross validation. import train display Now here if you just click

on shift and tab, you can go to the documentation and you can just see

the examples over here. And she can blast open it and then I just go

to examples and see how you can split your data. So over here you have

extra next test why drain why test and then using

the string test platelet and just passing your independent variable

and dependent variable and just Define a size

and a random straight to it. So, let me just copy this

and I'll just paste over here.

Over here, we'll train test. Then we have the dependent

variable train and test and using the split function

will pass in the independent and dependent variable

and then we'll set a split size. So let's say I'll put it up 0.3. So this basically means that your data set

is divided in 0.3 that is in 70/30 ratio. And then I can add

any random straight to it. So let's say I'm applying

one this is not necessary. If you want the same result

as that of mine, you can add the random stream. So this would basically

take exactly the same sample every Next I have to train

and predict by creating a model. So here logistic

regression will graph from the linear regression. So next I'll just type in from SK loan dot linear model

import logistic regression.

Next I'll just create the instance of this

logistic regression model. So I'll say log model is equals

to largest aggression now. I just need to fit my model. So I'll say log model dot fit and I'll just pass

in my ex train. And why it rain? It gives me all the details

of logistic regression. So here it gives me the class

made dual fit intercept and all those things then

what I need to do, I need to make prediction. So I will take a variable

insect addictions and I'll pass on the model to it. So I'll say

log model dot protect and I'll pass in the value

that is X test.

So here we have just

created a model fit that model and then we

had made predictions. So now to evaluate how my model

has been performing. So you can simply

calculate the accuracy or you can also calculate

a classification report. So don't worry guys. I'll be showing both

of these methods. So I'll say from sklearn dot matrix

input classification report. Are you start fishing report? And inside this I'll be passing

in why test and the predictions? So guys this is

my classification report.

So over here,

I have the Precision. I have the recall. We have the advanced code

and then we have support. So here we have the value

of decision as 75 72 and 73, which is not that bad now in order to calculate

the accuracy as well. You can also use the concept

of confusion Matrix. So if you want to print

the confusion Matrix, I will simply say from sklearn dot matrix import

confusion Matrix first of all, and then we'll just

print this So how am I function

has been imported successfully so is a confusion Matrix.

And I'll again passing

the same variables which is why

test and predictions. So I hope you guys already know

the concept of confusion Matrix. So can you guys give me

a quick confirmation as to whether you guys remember this confusion

Matrix concept or not? So if not, I can just quickly

summarize this as well. Okay charged with you say so yes. Okay. So what is not clear with this? So I'll just tell

you in a brief what confusion Matrix is all about? So confusion Matrix is nothing

but a 2 by 2 Matrix which has a four outcomes

this basic tells us that how accurate

your values are. So here we have

the column as predicted. No predicted Y and we

have actual know an actual. Yes. So this is the concept

of confusion Matrix.

So here let me just fade

in these values which we have just calculated. So here we have 105. 105 2125 and 63 So

as you can see here, we have got four outcomes now 105 is the value

where a model has predicted. No, and in reality. It was also a no so where we have predicted know

an actual know similarly. We have 63 as a predicted. Yes. So here the model predicted. Yes, and actually

also it was yes. So in order to

calculate the accuracy, you just need to add the sum of these two values and divide

the whole by the some. So here these two values

tells me where the order has. We predicted the correct output. So this value is also

called as true- This is called

as false positive. This is called as true positive and this is called

as false negative. Now in order to

calculate the accuracy. You don't have

to do it manually. So in Python, you can just import

accuracy score function and you can get

the results from that.

So I'll just do that as well. So I'll say from sklearn

dot-matrix import accuracy score and I'll simply

print the accuracy. I'm passing the same variables. That is why I test

and predictions so over here. It tells me the accuracy as 78

which is quite good so over here if you want to do it manually we

have 2 plus these two numbers, which is 105 263. So this comes out to almost 168

and then you have to divide by the sum of all

the phone numbers. So 105 plus 63 plus 21 plus 25, so this gives me

a result of to 1/4. So now if you divide these two number you'll get

the same accuracy that is 98% or you can say .78. So that is how you

can calculate the accuracy. So now let me just go back

to my presentation and let's see what all we have

covered till now.

So here we have First Data data

into train and test subset then we have build a model

on the train data and then predicted the output

on the test data set and then my fifth step

is to check the accuracy. So here we have calculator accuracy to almost

seventy eight percent, which is quite good. You cannot say

that accuracy is bad. So here tells me

how accurate your results. So him accuracy skoda finds that enhanced got

a good accuracy. So now moving ahead. Let us see the second project

that is SUV data analysis. So in this a car company has

released new SUV in the market and using the previous data

about the sales of their SUV.

They want to predict

the category of people who might be interested

in buying this. So using the

logistic regression, you need to find what factors

made people more interested in buying this SUV. So for this let us hear data set

where I have user ID I have Of gender as male

and female then we have the age. We have the estimated salary and then we have

the purchased column. So this is my discreet column or you can see

the categorical column. So here we just have the value that is 0 and 1 and this column

we need to predict whether a person can actually

purchase a SUV or Not.

So based on these factors,

we will be deciding whether a person can

actually purchase SUV or not. So we know the salary

of a person we know the age and using these we can predict whether person can

actually purchase SUV on Let me just go to my jupyter. Notebook and has implemented

a logistic regression. So guys, I will not be going

through all the details of data cleaning and analyzing

the part start part. I'll just leave it on you. So just go ahead

and practice as much as you can. Alright, so the second project

is SUV predictions. Alright, so first of all, I have to import

all the libraries so I say import numpy

SNP and similarly.

I'll do the rest of it. Alright, so now let

me just bring the head of this data set. So this give already seen

that we have columns as user ID. We have gender. We have the age. We have the salary

and then we have to calculate whether person can actually

purchase a SUV or not. So now let us just simply go on

to the algorithm part.

So we'll directly start off

with the logistic regression how you can train a model so

for doing all those things we first need to Define

an independent variable and a dependent variable. So in this case, I want my ex at is

an independent variable is a data set. I lock so here I will specify

sighing all the rows. So cool and basically stands

for that and in the columns, I want only two and

three dot values.

So here we should fetch

me all the rows and only the second

and third column which is age and estimated salary. So these are the factors which will be used to predict

the dependent variable that is purchase. So here my dependent

variable is purchase any dependent variable is

of age and salary. So I'll say later said dot I log I'll have all the rows

and add just one for column.

That is my position. Is column values. All right, so I just forgot when one square

bracket over here. Alright so over here. I have defined my independent

variable and dependent variable. So here my independent variable

is age and salary and dependent variable

is the column purchase. Now, you must be wondering

what is this? I lock function. So I look function is basically

an index of a panda's data frame and it is used

for integer based indexing or you can also say

selection by index now, let me just bring

these independent variables and dependent variable. So if I bring the independent

variable I have aged as well as a salary next. Let me print the dependent

variable as well. So over here you can see I

just have the values in 0 and 1 so 0 stands

for did not purchase next. Let me just divide my data set

into training and test subset. So I'll simply write in from SK loaned cross plate

dot cross validation.

Import rain test next I

just press shift and tab and over here. I will go to the examples

and just copy the same line. So I'll just copy this. I'll move the points now. I want to text size

to be let's see 25, so I have divided the trained

and tested in 75/25 ratio. Now, let's say I'll take

the random set of 0 So Random State basically

ensures the same result or you can say the same samples

taken whenever you run the code. So let me just run this now. You can also scale

your input values for better performing and this can be done

using standard scale. Oh, so let me do that as well. So I'll say

from sklearn pre-processing. Import standard scalar now. Why do we scale it now? If you see a data set we

are dealing with large numbers.

Well, although we are using

a very small data set. So whenever you're working

in a prod environment, you'll be working

with large data set we will be using thousands and hundred thousands of do

people's so they're scaling down will definitely

affect the performance by a large extent. So here let me just show you how you can scale down

these input values and then the pre-processing contains all

your methods & functionality, which is required

to transform your data. So now let us scale down for tests as well as

their training data set. So else First Make

an instance of it. So I'll say standard scalar then I'll have Xtreme sasc dot

fit fit underscore transform. I'll pass in my Xtreme variable. And similarly I can do

it for test wherein I'll pass the X test.

All right. Now my next step is

to import logistic regression. So I'll simply apply

logistically creation by first importing it so I'll say from sklearn sklearn the linear model import

logistic regression over here. I'll be using classifier. So is a classifier DOT is equals to largest aggression

so over here, I just make an instance of it. So I'll say logistic

regression and over here. I just pass in the random state, which is 0 No,

I simply fit the model.

And I simply pass in

X train and white rain. So here it tells

me all the details of logistic regression. Then I have to

predict the value. So I'll say why I prayed

it's equals to classifier. Then predict function

and then I just pass in X test. So now we have

created the model. We have scale down

our input values. Then we have applied

logistic regression.

We have predicted the values and now we want

to know the accuracy. So now the accuracy first we

need to import accuracy scores. So I'll say from

sklearn dot-matrix import actually see school and using this function we

can calculate the accuracy or you can manually do that by creating

a confusion Matrix. So I'll just pass. my lightest and my y

predicted All right. So over here I get

the accuracy is 89% So we want to know

the accuracy in percentage. So I just have to multiply it

by a hundred and if I run this so it gives me 89% So I hope you guys are clear with whatever I

have taught you today.

So here I have taken

my independent variables as age and salary and then

we have calculated that how many people

can purchase the SUV and then we have calculated

our model by checking the accuracy so over here

we get the accuracy is 89 which is great. Alright guys that is

it for today. So I'll Discuss what we have covered

in today's training. First of all, we had a quick introduction

to what is regression and where their aggression

is actually use then we have understood

the types of regression and then got into the details of what and why

of logistic regression of compared linear was

in logistic regression. If you've also seen

the various use cases where you can Implement

logistic regression in real life and then we have picked

up two projects that is Titanic data analysis and SUV prediction so

over here we have seen how you can collect your data

analyze your data then perform. Modeling on that date

that train the data test the data and then finally

have calculated the accuracy. So in your SUV prediction, you can actually

analyze clean your data and you can do a lot of things so you can just go ahead

pick up any data set and explore it as

much as you can.

What is classification. I hope every one of you

must have used Gmail. So how do you think the male

is getting classified as a Spam or not spam mail? Well, there's But

classification So What It Is Well

classification is the process of dividing the data set

into different categories or groups by adding label. In other way, you can say

that it is a technique of categorizing the observation

into different category. So basically what you

are doing is you are taking the data analyzing it and on the basis

of some condition you finely divided

into various categories.

Now, why do we classify it? Well, we classify it

to perform predictive analysis on it like when you get the mail the machine

predicts it Be a Spam or not spam mail and on the basis

of that prediction it add the irrelevant or spam mail to the respective folder

in general this classification. Algorithm handled questions. Like is this data belongs

to a category or B category? Like is this a male or is this

a female something like that? I getting it? Okay fine. Now the question arises

where will you use it? Well, you can use this

of protection order to check whether the transaction

is genuine or not suppose. I am using a credit. Here in India now due to some reason I had

to fly to Dubai now. If I'm using the credit

card over there, I will get a notification alert

regarding my transaction. They would ask me to confirm

about the transaction. So this is also kind

of predictive analysis as the machine predicts that something fishy is in the transaction

as very for our ago. I made the transaction using

the same credit card and India and 24 hour later.

The same credit card is being

used for the payment in Dubai. So the machine texts that

something fishy is going on in the transaction. So in order to confirm it it

sends you a notification alert. All right. Well, this is one of

the use case of classification you can even use it

to classify different items like fruits on the base

of its taste color size or weight a machine

well trained using the classification algorithm

can easily predict the class or the type of fruit whenever

new data is given to it. Not just the fruit. It can be any item. It can be a car. It can be a house.

It can be a signboard. Or anything. Have you noticed that while you visit some sites or you try to login

into some you get a picture capture for that right where you have to identify whether the given image is of

a car or its of a pole or not? You have to select it

for example that 10 images and you're selecting

three Mages out of it.

So in a way you are

training the machine, right you're telling that these three are

the picture of a car and rest are not so who knows you are training

at for something big right? So moving on ahead. Let's discuss the types

of education online. Well, there are

several different ways to perform the same tasks

like in order to predict whether a given person is a male or a female the machine

had to be trained first. All right, but there are multiple ways

to train the machine and you can choose any one of them just

for Predictive Analytics.

There are many

different techniques, but the most common of them

all is the decision tree, which we'll cover in depth

in today's session. So it's a part

of classification algorithm. We have decision tree

random Forest name buys. K-nearest neighbor Lodge is Regression linear regression

support Vector machines and so on there are many. Alright, so let me give

you an idea about few of them starting

with decision tree. Well decision tree is

a graphical representation of all the possible solution to a decision the decisions which are made they

can be explained very easily. For example here is a task, which says that should I go

to a restaurant or should I buy a hamburger

you are confused on that. So for the artboard you

will do you will create a dish entry for it starting with the root node

will be first of all, you will check

whether you are hungry or not. All right, if you're not hungry then

just go back to sleep.

Right? If you are hungry and you have $25 then you

will decide to go to restaurant and if you're hungry

and you don't have $25, then you will just

go and buy a hamburger. That's it. All right. So there's about decision tree

now moving on ahead. Let's see. What is a random Forest. Well random Forest build

multiple decision trees and merges them together

to get a more accurate and stable production. All right, most of the time

random Forest is trained with a bagging method. The bragging method

is based on the idea that the combination of learning module increases

the overall result. If you are combining the

learning from different models and then clubbing it together what it will do it will Increase

the overall result fine. Just one more thing. If the size of your

data set is huge. Then in that case one single

decision tree would lead to our Offutt model same way like a single person

might have its own perspective on the complete population as

a population is very huge.

Right? However, if we implement

the voting system and ask different individual

to interpret the data, then we would be able

to cover the pattern in a much meticulous way

even from the diagram. You can see that in section A we have Howard large

training data set what we do. We first divide

our training data set into n sub-samples on it and we create a decision tree

for each cell sample. Now in the B part

what we do we take the vote out of every decision made

by every decision tree. And finally we Club

the vote to get the random Forest dition fine. Let's move on ahead. Next. We have neighbor Buys. So name bias is

a classification technique, which is based on Bayes theorem. It assumes that it's of any particular feature in

a class is completely unrelated to the presence of any other feature

named buys is simple and easy to implement algorithm

and due to a Simplicity this algorithm might out perform

more complex model when the size of the data set

is not large enough. All right, a classical use case of Navy bias is

a document classification.

And that what you

do you determine whether a given text corresponds to one or more categories

in the Texas case, the features used might be

the presence or absence. Absence of any keyword. So this was about Nev

from the diagram. You can see

that using neighbor buys. We have to decide whether we have

a disease or not. First what we do we

check the probability of having a disease and not having the disease

right probability of having a disease is 0.1 while on the other hand

probability of not having a disease is 0.9.

Okay first, let's see when we have disease

and we go to the doctor. All right, so when we

visited the doctor and the test is positive

Adjective so probability of having a positive test when you're having a disease

is 0.8 0 and probability of a negative test when you already have

a disease that is 0.20. This is also a false negative

statement as the test is detecting negative, but you still have

the disease, right? So it's a false

negative statement. Now, let's move ahead when you don't have

the disease at all. So probability of not having

a disease is 0.9. And when you visit the doctor

and the doctor is like, yes, you have the disease.

But you already know

that you don't have the disease. So it's a false

positive statement. So probability of having

a disease when you actually know there is no disease

is 0.1 and probability of not having a disease when you actually know

there is no disease. So and the probability

of it is around 0.90 fine. It is same as probability

of not having a disease even the test is showing

the same results a true positive statement. So it is 0.9. All right. So let's move on ahead and

discuss about kn n algorithm.

So this KNN algorithm

or the k-nearest neighbor, it stores all

the available cases and classifies new cases based

on the similarity measure the K in the KNN algorithm as

the nearest neighbor, we wish to take vote

from for example, if k equal 1 then the object

is simply assigned to the class of that single nearest neighbor

from the diagram. You can see the difference

in the image when k equal 1 k equal 3

and k equal 5, right? Well the And systems are now able to use

the k-nearest neighbor for visual pattern

recognization to scan and detect hidden packages

in the bottom bin of a shopping cart

at the checkout if an object is detected which matches exactly

to the object listed in the database.

Then the price of the spotted

product could even automatically be added

to the customers Bill while this automated

billing practice is not used extensively at this time, but the technology

has been developed and is available for use if you want you can

just use It and yeah, one more thing k-nearest

neighbor is also used in retail to detect patterns in the credit card users many

new transaction scrutinizing software application use Cayenne algorithms to

analyze register data and spot unusual pattern that indicates

suspicious activity. For example,

if register data indicates that a lot

of customers information is being entered manually rather

than through automated scanning and swapping then in that case. This could indicate that the employees

were using the register. In fact stealing customers

personal information or if I register data indicates that a particular good

is being returned or exchanged multiple times.

This could indicate that employees are misusing

the return policy or trying to make money from

doing the fake returns, right? So this was about KNN algorithm. So starting with

what is decision tree, but first, let me tell

you why did we choose the Gentry to start with? Well, these decision tree

are really very easy to read and understand it belongs

to one of The few models that interpretable where you can understand exactly

why the classifier has made that particular decision right? Let me tell you a fact

that for a given data set. You cannot say that this algorithm performs

better than that. It's like you cannot say

that decision trees better than a buys or name biases performing better

than decision tree. It depends on the data set, right you have to apply

hit and trial method with all the algorithms one by one and then compare

the result the model which gives the best

result as the Order which you can use

at for better accuracy for your data set.

All right, so let's start

with what is decision tree. Well a decision tree is

a graphical representation of all the possible solution to our decision based

on certain conditions. Now, you might be wondering

why this thing is called as decision tree. Well, it is called so because it starts with the root and then branches off

to a number of solution just like a tree right even

the tree starts from a roux and it starts

growing its branches. As once it gets bigger and bigger similarly

in a decision tree. It has a roux which keeps on growing with

increasing number of decision and the conditions now, let me tell you

a real life scenario. I won't say that all of you, but most of you

must have used it. Remember whenever you dial

the toll-free number of your credit card company, it redirects you to his intelligent

computerised assistant where it asks

you questions like, press one for English

or press 2 for Henry, press 3 for this press

4 for that right now once you select one now again, It redirects you

to a certain set of questions like press

1 for this press 1 for that and similarly, right? So this keeps on repeating until you finally get

to the right person, right? You might think that you are caught

in a voicemail hell but what the company

was actually doing it was just using a decision tree

to get you to the right person.

I lied. I'd like you to focus

on this particular image for a moment on

this particular slide. You can see I image

where the task is. Should I accept

a new job offer or not? Alright, so you have

to decide that for That what you did you created

a decision tree starting with the base condition

or the root node. Was that the basic salary or the minimum salary

should be $50,000 if it is not $50,000. Then you are not at all

accepting the offer. All right. So if your salary is

greater than $50,000, then you will further check whether the commute is

more than one hour or not.

If it is more than one are you

will just decline the offer if it is less than one hour, then you are getting closer

to accepting the job offer then further what you will do. You will check

whether the company is offering. Free coffee or not, right if the company

is not offering the free coffee, then you will just

decline the offer and have fit as offering

the free coffee. And yeah, you will happily

accept the offer right? This is just an example

of a decision tree. Now, let's move ahead

and understand a decision tree. Well, here is a sample data set that I will be using

it to explain you about the decision tree. All right in this data set

each row is an example. And the first two columns

provide features or attributes that describes the data and the last column

gives the label or the class we want to predict and if you like you

can just modify this data by adding additional features and more example and our program will work

in exactly the same way fine.

Now this data set

is pretty straightforward except for one thing. I hope you have noticed that

it is not perfectly separable. Let me tell you something

more about that as in the second and fifth examples

they have the same features. But different labels

both have yellow as a Colour and diameter as three, but the labels are mango

and lemon right? Let's move on and see how our decision tree

handles this case.

All right, in order to build

a tree will use a decision tree algorithm called card

this card algorithm stands for classification and regression tree

algorithm online. Let's see a preview

of how it works. All right to begin

with We'll add a root node for the tree and all

the nodes receive a list of rows as a input and the route will receive

the entire training data set now each node will ask

true and false question about one other feature. And in response

to that question will split or partition the data set

into two different subsets these subsets then become

input to child node. We are to the tree and the goal of the question

is to finally unmix the labels as we proceed down or in

other words to produce the purest possible distribution

of the labels at each node. For example, the input

of this node contains only.

One single type

of label so we could say that it's perfectly unmixed. There is no uncertainty

about the type of label as it consists

of only grapes right on the other hand the labels

in this node are still mixed up. So we would ask another question

to further drill it down, right but before that we need to

understand which question to ask and when and to do that we need to conduct by how much question

helps to unmix the label and we can quantify

the amount of Uncertainty at a single node using

a metric called gini impurity and we can quantify how much a question reduces that uncertainty using a concept

called information game will use these to select the best

question to ask at each point.

And then what we'll do

we'll iterate the steps will recursively build the tree on each of the new node

will continue dividing the data until there are

no further question to ask and finally we

reach to our Leaf. Alright, alright,

so this was about decision tree. So in order to create

a diversion First of all what you have to do

you have to identify different set of questions that you can ask to a tree

like is this color green and what will be these question

this question will be decided by your data set like as

this colored green as the diameter greater

than equal to 3 is the color yellow right questions resembles

to your data set remember that? All right. So if my color is green, then what it will do it

will divide into two part first.

The Green Mango will be

in the true while on the false. We have lemon

and the map all right. And if the color is green or the diameter is greater

than equal to 3 or the color is yellow. Now let's move on and understand about

decision tree terminologies. Alright, so starting with root node root node

is a base node of a tree the entire tree starts

from a root node.

In other words. It is the first node

of a tree it represents the entire population or sample and this entire population

is further segregated or divided into two

or more homogeneous set. Fine. Next is the leaf node. Well, Leaf node is the one when you reach

at the end of the tree, right that is you

cannot further segregated down to any other level. That is the leaf node. Next is splitting splitting

is dividing your root node or node into different sub part

on the basis of some condition. All right, then comes

the branch or the sub tree. Well, this Branch

or subtree gets formed when you split the tree suppose

when you split a root node, it gets divided

into two branches or two subtrees right next. The concept of pruning. Well, you can say

that pruning is just opposite of splitting what we

are doing here. We are just removing

the sub node of a decision tree will see more about pruning

later in this session. All right, let's move on ahead.

Next is parent or child node. Well, first of all root node

is always the parent node and all other nodes associated with that

is known as child node. Well, you can understand it

in a way that all the top node belongs to a parent node

and all the bottom node which are derived from

a Top node zhi node the node producing a further note is

a child node and the node which is producing. It is a parent node

simple concept, right? Let's use the cartel Gotham

and design a tree manually.

So first of all, what you do you decide

which question to ask and when so

how will you do that? So let's first of all visualize

the decision tree. So there's the decision tree

which will be creating manually or like first of all, let's have a look

at the Data set you have Outlook temperature

humidity and windy as you have different attributes

on the basis of that you have to predict that

whether you can play or not. So which one among them should

you pick first answer determine the best attribute that

classifies the training data? All right. So how will you choose

the best attribute or how does a tree decide where to split or how the tree

will decide its root node? Well before we move on and split a tree there

are some terminologies that you should know.

All right first

being the gini index. X so what is this gini Index? This gini index is the measure

of impurity or Purity used in building a decision

Tree in cartel Gotham. All right. Next is Information Gain

this Information Gain is the decrease in entropy after data set is split

on the basis of an attribute constructing a decision tree is

all about finding an attribute that Returns the highest

Information Gain. All right, so you

will be selecting the node that would give you

the highest Information Gain. Alright next is

reduction in variance. Reduction in variance is

an algorithm which is used for continuous Target variable

or regression problems. The split with lower variance

is selected as a criteria to let the population see

in general term. What do you mean by variance? Variance is how much

your data is wearing? Right? So if your data is

less impure or is more pure than in that case

the variation would be less as all the data

almost similar, right? So there's also a way

of setting a tree the split with lower variance is selected as the criteria

to split the population.

All right. Next is the chi Square t Square. It is an algorithm which is used to find out

these statistical significance between the differences

between sub nodes and the parent nodes fine. Let's move ahead now

the main question is how will you decide

the best attribute for now just understand that you need to calculate

something known as information game the attribute with the highest Information

Gain is considered the best. Yeah. I know your next question

might be like what? This information, but before we move on and see what exactly Information Gain

Is let me first introduce you to a term called entropy because this term will be used in calculating

the Information Gain. Well entropy is just a metric which measures the impurity

of something or in other words. You can say that as

the first step to do before you solve the problem

of a decision tree as I mentioned is

something about impurity.

So let's move on and understand

what is impurity suppose. You are a basket full of apples and another Bowl Which

is full of same label, which says Apple now if you are asked

to pick one item from each basket and ball, then the probability

of getting the apple and it's correct label is 1 so

in this case, you can say that impurities zero.

All right. Now what if there are

four different fruits in the basket and four different

labels in the ball, then the probability

of matching the fruit to a label is obviously not one. It's something less than that. Well, it could be possible that I picked banana

from the basket and when I randomly

picked Level from the ball. It says a cherry

any random permutation and combination can be possible. So in this case, I'd say

that impurities is nonzero. I hope the concept

of impurities here. So coming back to entropy as I said entropy is

the measure of impurity from the graph on your left. You can see that as the probability

is zero or one that is either they

are highly impure or they are highly pure

than in that case the value of entropy is zero. And when the probability is

0.5 then the value of entropy. Is maximum. Well, what is impurity

impurities the degree of Randomness how random data is so if the data is completely pure in that case

the randomness equals zero or if the data is completely empty

or even in that case the value of impurity

will be zero question.

Like why is it that the value

of entropy is maximum at 0.5 might arise

in a mine, right? So let me discuss about that. Let me derive it mathematically as you can see here on the slide

the mathematical formula of entropy is – of probability of yes, let's move on and see what this graph has to say

mathematically suppose s is our total sample space

and it's divided into two parts. Yes, and no like

in our data set the result for playing was divided

into two parts.

Yes or no, which we have to predict

either we have to play or not. Right? So for that particular case, you can Define the formula

of entropy as entropy of total sample

space equals negative of probability of e is multiplied by

log of probability. We of yes, whether base 2 minus probability

of no X log of probability of no with base to where s is

your total sample space and P of v s is

the probability of e s– and p– of know is

the probability of no. Well, if the number

of BS equal number of know that is probability

of s equals 0.5 right since you have equal number

of BS and know so in that case the value of entropy will be one just

put the value over there. All right. Let me just move to Next slide

I'll show you this. Alright next is

if it contains all Yes, or all know that is probability

of a sample space is either 1 or 0 then in that case entropy

will be equal to 0 Let's see the

mathematically one by one. So let's start

with the first condition where the probability was 0.5.

So this is our formula

for entropy, right? So there's our first case right

which will discuss the art when the probability

of vs equal probability of node that is in our data set we have

Rule number of yes, and no. All right. So probability of yes

equal probability of no and that equals

0.5 or in other words, you can say that yes plus no equal

to Total sample space. All right, since

the probability is 0.5. So when you put the values in the formula you get

something like this and when you calculate it, you will get the entropy of

the total sample space as one. All right. Let's see for the next case. What is the next case

either you have totally us or you have to No, so if you have total, yes, let's see the formula

when we have total. Yes. So you have all yes

and 0 no fine. So probability of e s equal one.

And yes as the total

sample space obviously. So in the formula

when you put that thing up here, you get entropy of sample space equal negative X

of 1 multiplied by log of 1 as the value of log 1 equals 0. So the total thing will result

to 0 similarly is the case with no even in that case

you will get the entropy of total sample. Case as 0 so this was

all about entropy. All right. Next is what is

Information Gain? Well Information Gain what it does is it measures

the reduction in entropy. It decides which attribute should be selected

as the decision node. If s is our total collection than Information Gain

equals entropy, which we calculated

just now that – weighted average multiplied

by entropy of each feature. Don't worry. We'll just see how it to calculate

it with an example. All right. So let's manually build

a decision tree for our data set. So there's our data set which consists of

14 different instances out of which we have nine. Yes and five know I like

so we have the formula for entropy just put

over that since 9 years. So total probability

of e s equals 9 by 14 and total probability

of no equals Phi by 14 and when you put up the value and calculate the result

you will get the value.

Oh of entropy as 0.94. All right. So this was your first step that is compute the entropy

for the entire data set. All right. Now you have to select that out of Outlook

temperature humidity and windy, which of the node should you

select as the root node big question, right? How will you decide that? This particular node should

be chosen at the base note and on the basis of that only I will be creating

the entire tree. I will select that. Let's see so you have to do it one by one you have

to calculate the entropy and Information Gain for all of the Front note so

starting with Outlook. So Outlook has three different parameters

Sunny overcast and rainy. So first of all select

how many number of years and no are there in the case

of Sunny like when it is sunny how many number of years and how many number

of nodes are there? So in total we have to yes and three Nos and case

of sunny in case of overcast.

We have all yes. So if it is overcast then

will surely go to play. It's like that. Alright and next it is rainy

then total number of vs equal. Three and total number

of no equals 2 fine next what we do we

calculate the entropy for each feature for here. We are calculating the entropy

when Outlook equals Sunny. First of all, we are assuming

that Outlook is our root node and for that we are calculating

the information gain for it.

Alright. So in order to calculate

the Information Gain remember the formula it was entropy

of the total sample space – weighted average X entropy

of each feature. All right. So what we are doing here, we are calculating

the entropy of out. Look when it was sunny. So total number of yes, when it was sunny was

to and total number of know that was three fine. So let's put up in the formula since the probability

of yes is 2 by 5 and the probability

of no is 3 by 5.

So you will get

something like this. Alright, so you are

getting the entropy of sunny as zero point

nine seven one fine. Next we will calculate

the entropy for overcast when it was overcast. Remember it was all yes, right. So the probability of yes is equal 1

and when you put over that you will get the value

of entropy as 0 fine and when it was rainy rainy

has 3s and to nose. So probability of e s

in case of Sonny's 3 by 5 and probability of know

in case of Sonny's 2 by 5. And when you add the value

of probability of vs and probability of no

to the formula, you get the entropy of sunny as

zero point nine seven one point.

Now, you have to calculate how much information you

are getting from Outlook that equals weighted average. All right. So what was this? To diverge total number of years

and total number of no fine. So information from Outlook

equals 5 by 14 from where does this 5 came over? We are calculating the total number of sample space

within that particular Outlook when it was sunny, right? So in case of Sunny there

was two years and three NOS. All right. So weighted average for Sonny

would be equal to 5 by 14. All right, since the formula was five

by 14 x entropy of each feature. All right, so as calculated the entropy He

for Sonny is zero point nine.

Seven one, right? So what we'll do we'll multiply

5 by 14 with 0.97 one. Right? Well, this was

the calculation for information when Outlook equal sunny, but Outlook even equals overcast

and rainy for in that case. What we'll do again similarly

will calculate for everything for overcast and sunny for overcast weighted averages for by 14 multiplied

by its entropy. That is 0 and for Sonny

it is same Phi by 14. Yes, and to Knows X its entropy that is zero point

nine seven one. And finally we'll take the sum

of all of them which equals to 0.693 right next. We will calculate

the information gained this what we did earlier was

information taken from Outlook. Now, we are calculating. What is the information? We are gaining

from Outlook right. Now this Information Gain that equals to Total entropy

minus the information that is taken from Outlook.

All right, so So

total entropy we had 0.94 – information we took

from Outlook as 0.693. So the value of information

gained from Outlook results to zero point two four seven. All right. So next what we have to do. Let's assume that

Wendy is our root node. So Wendy consists of

two parameters false and true. Let's see how many years and how many nodes are there

in case of true and false. So when Wendy has

Falls as its parameter, then in that case it has

six years and to knows.

And when it as true

as its parameter, it has 3 S and 3 nodes. All right. So let's move ahead and similarly calculate

the information taken from Wendy and finally calculate the

information gained from Wendy. Alright, so first of all, what we'll do we'll

calculate the entropy of each feature starting

with windy equal true. So in case of true we

had equal number of yes and equal number

of no will remember the graph when we had the probability as 0.5 as total number of years

equal total number of know. For that case

the entropy equals 1 so we can directly

write entropy of room when it's windy is one as we had already proved it when probability equals 0.5

the entropy is the maximum that equals to 1. All right. Next is entropy of false

when it is windy. All right, so similarly just

put the probability of yes and no in the formula

and then calculate the result since you have six years

and two nodes.

So in total, you'll get the probability

of e S6 by 8 and probability of know Two by eight. All right, so when you

will calculate it, you will get the entropy of false as zero point

eight one one. Alright, now, let's calculate

the information from windy. So total information

collected from Windy equals information taken when Wendy equal true

plus information taken when when D equals false. So we'll calculate the weighted

average for each one of them and then we'll sum it up to finally get the total

information taken from windy. So in this case, it equals to 8 by 14 multiplied

by 0.8 1 1 + 6 y 14 x 1 what is this? 8 it is total number of yes, and no in case when when D

equals false, right? So when it was false,

so total number of BS that equals to 6 and total more

of know that equal to 2 that some herbs to 8. All right. So that is why the weighted

average results to Aid by 14 similarly information taken when windy equals true equals

to 3 plus 3 that is 3 S and 3 no equal 6 divided by

total number of sample space.

That is 14 x That

is entropy of true. All right, so it is a by 14 multiplied by 0.8 1 1

plus 6 by 14 x one which results to 0.89 to this is information taken from Windy. All right. Now how much information

you are gaining from Wendy. So for that what you will do so

total information gained from Windy that equals

to Total entropy – information taken from Windy. All right, that is 0.94 – 0.89 to that equals

to zero point zero four eight. And so 0.048 is the information

gained from Windy. All right. Similarly we calculated

for the rest to all right. So for Outlook

as you can see, the information was 0.693. And it's Information Gain

was zero point two four seven in case of temperature. The information was around zero point nine one one

and the Information Gain that was equal to 0.02

9 in case of humidity. The information gained was 0.15

to and in the case of windy. The information

gained was 0.048. So what we'll do we'll

select the attribute.

With a maximum fine. Now, we are selected

Outlook as our root node, and it is further subdivided into three different parts

Sunny overcast and rain, so in case of overcast

we have seen that it consists of all. Yes, so we can consider

it as a leaf node, but in case of sunny and rainy, it's doubtful as it

consists of both. Yes and both know so you need to recalculate

the things right again for this node.

You have to

recalculate the things. All right, you have to again

select the attribute. Is having the maximum

Information Gain. All right, so there's how your complete tree

will look like. All right. So, let's see when you can play

so you can play when Outlook is overcast. All right, in that case. You can always play

if the Outlook is sunny. You will further drill down

to check the humidity condition.

All right, if the

humidity is normal, then you will play if the humidity is high

then you won't play right when the Outlook predicts that it's rainy then

further you will check whether it's windy or not. If it is a week went then

you will go and offer. Say but if it has strong wind,

then you won't play right? So this is how your entire

decision tree would look like at the end. Now comes the concept

of pruning say is that what should I do to play? Well you have to do

pruning pruning will decide how you will play.

What is this pruning? Well, this pruning is nothing

but cutting down the nodes and order to get

the optimal solution. All right. So what pruning does it

reduces the complexity? All right as are you

can see on the screen that it showing only

the result for you. That is it showing all

the result which says that you can play. All right before we drill down

to a practical session a common question

might come in your mind. You might think that our tree base model better

than cleaner model, right? You can think like if I

can use a logistic regression for classification problem and linear regression

for regression problem. Then why there is

a need to use the tree. Well many of us have this In

in their mind and well, there's a valid question too. Well, actually as

I said earlier, you can use any algorithm. It depends on

the type of problem. You're solving let's look

at some key factor, which will help you to decide

which algorithm to use and when so the first point being if the relationship between

dependent and independent variable as well approximated by a linear model then linear regression will outperform

tree base model second case if there is a high

non-linearity and complex relationship between Lent and independent variables

at remodel will outperform a classical regression

model in third case.

If you need to build a model which is easy to explain

to people a decision tree model will always do better

than a linear model as the decision tree models are simpler to interpret

then linear regression. All right. Now, let's move on ahead and see how you can write it as

Gentry classifier from scratch and python using

the card algorithm. All right for this. I will be using jupyter notebook

with python 3.0. Oh install on it. Alright, so let's

open the Anaconda and the jupyter notebook. Whereas that so this is a inner Corner Navigator

and I will directly jump over to jupyter notebook and hit

the launch button. I guess everyone

knows that jupyter. Notebook is a web-based

interactive Computing notebook environment where you

can run your python codes. So my jupyter notebook. It opens on my Local

Host double 8 9 1 so I will be using

this jupyter notebook in order to write

my decision tree classifier using python for this

decision tree classifier. I have already written. Set of codes.

Let me explain you

just one by one. So we'll start with initializing

our training data set. So there's our sample data set for which each row

is an example. The last column is a label and the first two columns

are the features. If you want you can add some

more features an example for your practice

interesting fact is that this data set

is designed in a way that the second and fifth

example have almost the same features, but they have different labels. All right. So let's move on and see

how the tree handles this case as you can see here both. Both of them the second and the fifth column

have the same features. What did different

is just their label? Right? So let's move ahead.

So this is our training data

set next what we are doing we are adding some column labels. So they are used only

to print the trees fine. So what we'll do we'll add

header to the columns like the First Column is

of color second is of diameter and third is a label column. Alright, next Road

will do will Define a function as unique values

in which will pass the rows and the columns. So this function

what it will do. We find the unique values

for a column in the data set. So this is an example for that. So what we are doing here, we are passing

training data Hazard row and column number as 0 so what we are doing we are finding

unique values in terms of color.

And in this since the row is training data

and the column is 1 so what you are doing here, so we are finding

the unique values in terms of diameter fine. So this is just an example next what we'll do we'll Define

a function as class count and we'll pass zeros into it. So what it does it counts

the number of each type of Example within data set. So in this function what we are basically doing

we are counting the number of each type for example in the data set or

what we are doing. We are counting the unique

values for the label in the data set as a sample. You can see here.

We can pass that entire

training data set to this particular function

as class underscore count what it will do it will find

all the different types of label within the training data set as you can see here the unique

label consists of mango grape and lemon so next what we'll do

we'll Define a function is numeric and we'll pass

a value into it. So what it Do it. We'll just test if the value is numeric

or not and it will return if the value is

an integer or a float. For example, you

can see is numeric. We are passing 7

so it is an integer so it will return in value and if we are passing red it's

not a numeric value, right? So moving on ahead where you define a class

named as question.

So what this question does this question is used

to partition the data set. This class voted does it

just records a column number? For example 0 for color a light

and a column value for example, green Next what we are doing

we are defining a match method which is used to compare

the feature value in the example. The feature values

stored in the question. Let's see how first of all

what you are doing.

We're defining an init

function and inside that we are passing

the self column and the value as parameter. So next what we do

we Define a function as match what it does is it

compares the feature value in an example to the feature

value in this question when next we'll Define

a function as re PR, which is just a helper method

to print the question in a readable format. Next what we are doing we are

defining a function partition. Well, this function

is used to partition the data set each row

in the data set it checks if it matched

the question or not if it does so it adds it

to the true rose or if not, then it adds to the false Rose. All right, for example, as you can see, it's partition

the training data set based on whether the rows

are ready or not here.

We are calling

the function question and we are passing a value

of zero and read to it. So what did we do? It will assign all the red rose

to True underscore Rose. And everything else

will be assigned to false underscore rose fine. Next what we'll do we'll Define

a gini impurity function and inside that will pass

the list of rows. So what it will do it will just

calculate the dream Purity for the list of rows. Next what we are doing

every defining a function as Information Gain.

So what this Information Gain

function does it calculates The Information Gain

using the uncertainty of the starting node – the weighted impurity

of the child node. The next function

is find the best plate. Well, this function is used

to find the best question to ask by iterating over

every feature of value and then calculating

the information game. For the detail explanation

on the code. You can find the code

in the description given below. All right next we'll define

a class as leave for classifying the data. It holds a dictionary of glass

like mango for how many times it appears in the row

from the training data that reaches the sleeve. Alright next is

the decision node. So this decision node,

it will ask a question. This holds a reference

to the question and the two child nodes

on the base of that you are deciding which node

to add further to which branch.

Alright so next video. We're defining a function

of Beltre and inside that we are passing

our number of rows. So this is the function

that is used to build the tree. So initially what we did we

Define all the various function that we'll be using

in order to build a tree. So let's start by partitioning the data set

for each unique attribute, then we'll calculate

the information gain and then return the question that produces the highest gain and on the basis of that

will split the tree. So what we are doing here, we are partitioning

the data set calculating the Information Gain. And then what this is returning

it is returning the question that is producing

the highest gain.

All right. Now if gain equals

0 return Leaf Rose, so what it will do. So if we are getting

no for the gain that is gain equals

0 then in that case since no further question

could be asked so what it will do it

will return a leaf fine now true or underscore Rose or false underscore Rose

equal partition with rose and the question. So if we are reaching

tell this position, then you have already

found a Value which will be used

to partition the data set then what you will do you

will recursively build the true branch and similarly recursively

build the false Branch.

So return Division

and Discord node and side that will be passing question

true branch and false Branch. So what it will do it

will return a question node. This question node this

recalls the best feature or the value to ask

at this point fine. Now that we have

Builder tree next what we'll do we'll Define

a print underscore tree function which will be used

to print the tree fine. So finally what we are doing

in this particular function that we are printing our tree

next is the classify function which will use it to decide whether to follow the true

Branch or the false branch and then compared to the feature values stored

in the node to the example. We are considering

and last what we'll do we'll finally print

the production at the leaf. So let's execute

it and see okay, so there's our testing data. Online so we printed

a leaf as well. Now that we have trained

our algorithm is our training data set

now it's time to test it.

So there's our testing data set. So let's finally execute

it and see what is the result. So this is the result you

will get so first question, which is asked by the algorithm

is is diameter greater than equal to 3, if it is true, then it will further ask

if the color is yellow again, if it is true, then it will predict mango

as one and lemon with one. And in case it is false, then it will just

predict the mango. Now. This was the true part. Now next coming

to diameter is not greater than or equal to 3 then

in that case it's false. And what did we do? It'll just predict

the grape vine. Okay. So this was all

about the coding part now, let's conclude this session.

But before concluding let me

just show you one more thing. Now. There's a scikit-learn

algorithm cheat sheet, which explains you which algorithm you should use

and when all right, let's build in

a decision tree format. At let's see how it is Big. So first condition it will check whether you have

50 samples or not. If your samples

are greater than 50, then we'll move ahead

if it is less than 50, then you need

to collect more data if your sample

is greater than 50, then you have to decide whether you want to predict

a category or not. If you want to

predict a category, then further you will see that whether you

have labeled data or not. If you have label data, then that would be a classification

algorithm problem.

If you don't have

the label data, then it would be

a clustering problem. Now if you don't want

to The category then what you want to protect

predict a quantity. Well, if you want

to predict a quantity, then in that case, it would be

a regression problem. If you don't want to predict a quantity and you want

to keep looking further, then in that case, you should go for dimensionality

reduction problems and still if you don't want to look and the predicting structure

is not working. Then you have

tough luck for that.

I hope this doesn't recession

clarifies all your doubt over decision tree algorithm. Now, we'll try to find out

the answer to this particular question as to why we

need random Forest fine. So like human beings learn

from the past experiences. So unlike human beings

a computer does not have experiences then how does

machine takes decisions? Where does it learn from? Well a computer system actually learns from the data which

represents some past experiences of an application domain. So now let's see, how random Forest It's

in building up in learning model with a very simple use case

of credit risk detection. Now needless to say that credit card companies have a very nested

interest in identifying Financial transactions that are illegitimate

and criminal in nature.

And also I would like

to mention this point that according to

the Federal Reserve payments study Americans used

credit cards to pay for twenty six point

two million purchases in 2012 and The estimated loss

due to unauthorized transactions that here was u.s. 6 point 1 billion dollars now in the banking industry

measuring risk is very critical because the stakes are too high. So the overall goal is

actually to figure out who all can be fraudulent before too much Financial

damage has been done. So for this a credit card

company receives thousands of applications for new cards and each application

contains information. Mission about an

applicant, right? So so here as you can see

that from all those applications what we can actually

figure out is that predictor variables. Like what is the marital

status of the person? What is the gender

of the person? What is the age of the person

and the status which is actually whether it is a default pair

or non-default pair.

So default payments are

basically when payments are not made in time and according to the agreement

signed by the cardholder. So now that account is actually

set to be in the default. So you can easily

figure out the history of the particular card holder

from this then we can also look at the time of payment whether he has been

a regular pair or non regular one. What is the source of income

for that particular person and so and so forth. So to minimize loss

the back actually needs certain decision rule to predict whether to approve Particular no one of

that particular person or not. Now here is where the random Forest

actually comes into the picture. All right. Now, let's see how random

Forest can actually help us in this particular scenario. Now, we have taken randomly two parameters out of all

the predictive variables that we saw previously now, we have taken two

predictor variables here. The first one is the income and the second one

is the H right and Hurley parallel it to decision trees

have been implemented upon those predicted variables

and let's first assume the case of the income variable right? So here we have divided

our income into three categories the first one being the person

earning over $35,000 second from 15 to 35 thousand dollars

the third one running in the range of 0 to

15 thousand dollars.

Now if a person

is earning over $35,000, which is a pretty Good

income pretty decent. So now we'll check out

for the credit history. And here the probability is that if a person is earning

a good amount then there is very low risk that he won't be able to pay

back already earning good. So the probability is that his application

of loan will get approved. Right? So there is actually low risk

or moderate risk, but there's no real issue

of higher risk as such. We can approve

the applicants request here.

Now, let's move on and watch out

for the second category where the person

is actually earning from 15 to 35 thousand dollars

right now here the person may or may not pay back. So in such scenarios will look

for the credit history as to what has been

his previous history. Now if his previous

history has been bad like he has been a default ER

in the previous transactions will definitely not Consider

approving his request and he will be at the high risk in which

is not good for the bank. If the previous history of that particular

applicant is really good. Then we will just to clarify a doubt will consider

another parameter as well that will be on depth. I have his already

in really high dip then the risks again increases

and there are chances that he might not pay

repay in the future. So here Will. Not accept the request

of the person having high dipped if the person is

in the low depth and he has been a good pair

in his past history.

Then there are chances that he might be back

and we can consider approving the request

of this particular applicant. Alex look at the third category, which is a person earning

from 0 to 15 thousand dollars. Now, this is something

which actually raises I broke and this person

will actually lie in the category of high risk. All right. So the probability is that his application of loan

would probably get rejected now, we'll get one final outcome from

this income parameter, right? Now let us look

at our second variable that is H which will lead

into the second decision tree. Now. Let us say

if the person is Young, right? So now we will look forward to

if it is a student now if it is a student then

the chances are high that he won't be

able to repay back because he has

no earning Source, right? So here the risks are too high

and probability is that his application

of loan will get rejected fine. Now if the person is Young and his Not the student

then we'll probably go on and look for another variable. That is pan balance.

Now. Let's look if the bank balance

is less than 5 lakhs. So again the risk arises

and the probabilities that his application

of loan will get rejected. Now if the person

is Young is not a student and his bank balance so of greater than 5 lakhs

is got a pretty good and stable and balanced

then the probability is that he is sort of application will get approved of Now

let us take another scenario if he's a senior, right? So if he is a senior

will probably go and check out for this credit history. How well has he been

in his previous transactions? What kind of a person he is like whether he's a defaulter

or is Ananda falter. Now if he is a very

fair kind of person in his previous transactions

then again the risk arises and the probability

of his application getting rejected actually

increases right now if he has An excellent person as per his transactions

in the previous history.

So now again here

there is least risk and the probabilities that his application

of loan will get approved. So now here these two variables

income and age have led to two different decision trees. Right and these two different

decision trees actually led to two different results. Now what random forest does is

it will actually compile these two different results

from these two different. Gentry's and then finally, it will lead

to a final outcome. That is how random

Forest actually works. Right? So that is actually the motive

of the random Forest. Now let us move forward and see

what is random Forest right? You can get an idea of the mechanism from the name

itself random forests. So a collection

of trees is a fortress that's why I called

for is probably and here also the trees are actually

because being trained on subsets which are being

selected at random. And therefore they are called

random forests So Random forests is a collection or an insane. Humble of decision trees right

here decision trees actually built using the whole data

set considering all features, but actually in random Forest

only a fraction of the number of rows is selected and that too at random and a particular

number of features, which are actually selected

at random are trained upon and that is how the decision trees

are built upon.

Right? So similarly number

of decision trees will be grown and each decision tree will Salt

into a certain final outcome and random Forest

will do nothing but actually just

compiled the results of all those decision trees

to bring up the final result. As you can see

in this particular figure that a particular instance

actually has resulted into three different

decision trees, right? So not tree one results into

a final outcome called Class A and tree to results into class B.

Similarly tree

three results into class P So Random Forest will compile the results

of all these Decision trees and it will go by the call

of the majority voting now since head to decision trees

have actually voted into the favor of the Class B

that is decision tree 2 and 3. Therefore the final outcome will

be in the favor of the Class B. And that is how random

Forest actually works upon. Now one really beautiful thing about

this particular algorithm is that it is one

of the versatile algorithms which is capable of Performing

both regression as well as Now, let's try to understand

random Forest further with a very beautiful example

or this is my favorite one. So let's say you want to decide if you want to watch edge

of tomorrow or not, right? So in this particular scenario, you will have two different

actions to work Bond either.

You can just straight away go to your best friend

asked him about. All right, whether should I go for Edge

of Tomorrow not will I like this movie or you

can ask Your friends and take their opinion

consideration and then based on the final results who can go out and watch Edge

of Tomorrow, right? So now let's just take

the first scenario. So where you go

to your best friend asked about whether you should go

out to watch edge of tomorrow or not. So your friend will probably

ask you certain questions like the first one being

here Jonah So so let's say your friend asks you if you really like

The Adventurous kind of movies or not. So you say yes, definitely I would love to watch

it Venture kind of movie. So the probabilities that you will like edge

of tomorrow as well. Since Age of Tomorrow is

also a movie of Adventure and sci-fi kind

of Journal right? So let's say you do not like

the adventure John a movie.

So then again

the probability reduces that you might really

not like edge of Morrow right. So from here you can come

to a certain conclusion right? Let's say your best friend puts

you into another situation where he'll ask you or a do you like Emily Blunt

and you see definitely I like Emily Blunt and then he

puts another question to you. Do you like Emily Blunt

to be in the main lead and you say yes, then again, the probability arises that you will definitely

like edge of tomorrow as well because Edge of Tomorrow

is Has the Emily plant in the main lead cast so and if you say oh I do not like

Emily Blunt then again, the probability reduces that you would like Edge

of Tomorrow to write.

So this is one way where you have one decision tree

and your final outcome. Your final decision will be

based on your one decision tree, or you can see your final

outcome will be based on just one friend. No, definitely not

really convinced. You want to consider the options

of your other friends also so that you can make

very precise and crisp decision right you go out and you approach some other

bunch of friends of yours. So now let's say you go

to three of your friends and you ask them

the same question whether I would like to watch

it off tomorrow or not. So you go out and approach three or four friends friend

one friend twin friend three.

Now, you will consider

each of their Sport and then you will your decision

now will be dependent on the compiled results of all

of your three friends, right? Now here, let's say you go

to your first friend and you ask him whether you would like

to watch it just tomorrow not and your first friend

puts you to one question. Did you like Top Gun? And you say yes, definitely I did like the movie

Top Gun then the probabilities that you would like

edge of tomorrow as well because topgun is actually

a military action drama, which is also Tom Cruise.

So now again the probability

Rises that yes, you will like edge

of tomorrow as well and If you say no I didn't like

Top Gun then again. The chances are that you wouldn't like Edge

of Tomorrow, right? And then another question

that he puts you across is that do you really like

to watch action movies? And you say yes, I would love to watch

them that again. The chances are that you would like

to watch Edge of Tomorrow. So from your friend when you can come

to one conclusion now here since the ratio of liking the movie

to don't like is actually 2 is to 1 so the final

result is Actually, you would like Edge of Tomorrow.

Now you go to your second friend

and you ask the same question. So now you are second friend

asks you did you like far and away when we went

out and did the last time when we washed it and you say no I really

didn't like far and away then you would say then

you are definitely going to like Edge of Tomorrow. Why does so because far

and away is actually since most of whom

might not be knowing it so far in a ways Johner of romance and it revolves around a girl and a guy By falling in love

with each other and so on. So the probability is that you wouldn't like

edge of tomorrow. So he ask you another question. Did you like Bolivian and to really like

to watch Tom Cruise? And you say Yes, again. The probability is that you would like

to watch Edge of Tomorrow. Why because Oblivion

again is a science fiction casting Tom Cruise full

of strange experiences.

And where Tom Cruise is

the savior of the masses. Kind well, that is the same kind of plot

in edge of tomorrow as well. So here it is pure yes that you would like

to watch edge of tomorrow. So you get another second decision

from your second friend. Now you go to your third

friend and ask him so probably our third friend is

not really interesting in having any sort

of conversation with you say, it just simply asks you did you

like Godzilla and you said no I didn't like Godzilla's we said definitely

you wouldn't like it's of tomorrow why so because Godzilla is also

actually sign Fiction movie from the adventure Jonah.

So now you have got

three results from three different decision trees

from three different friends. Now you compile the results

of all those friends and then you make

a final call that yes, would you like to watch edge

of tomorrow or not? So this is some very real time

and very interesting example where you can actually

Implement random Forest into ground reality right

any questions so far. So far, no, that's good, and then

we can move forward.

Now let us look

at various domains where random Forest

is actually used. So because of its diversity

random Forest is actually used in various diverse to means like so beat banking beat medicine beat land use

beat marketing name it and random Forest is there so

in banking particularly random Forest is being

actually used to make it out whether the applicant

will be a default a pair or it Will be non default of 1 so that it can accordingly approve or reject

the applications of loan, right? So that is how random Forest

is being used in banking talking about medicine. Random. Forest is widely used in medicine field

to predict beforehand. What is the probability if a person will actually have

a particular disease or not? Right? So it's actually used to look

at the various disease Trends. Let's say you want to figure

out what is the probability that a person

will have diabetes? Not and so what would you do? It'd probably look

at the medical history of the patient and then

you will see or read.

This has been

the glucose concentration. What was the BMI? What was the insulin levels in the patient in the past

previous three months. What is the age

of this particular person and will make a different

decision trees based on each one of these predictor variables and then you'll finally

compiled the results of all those variables

and then you'll make a fine. Final decision as to whether the person

will have diabetes in the near future or not. That is how random

Forest will be used in medicine sector now move. Random Forest is also actually

used to find out the land use. For example, I want to set

up a particular industry in certain area. So what would I probably

look for a look for? What is the

vegetation over there? What is the Urban

population over there? Right and how much is the Is

from the nearest modes of Transport like

from the bus station or the railway station

and accordingly.

I will split my parameters and I will make decision

on each one of these parameters and finally I'll compile

my decision of all these parameters in that

will be my final outcome. So that is how I

am finally going to predict whether I should put my industry at this particular

location or not. Right? So these three examples

have actually been of majorly around classification problem because we are

trying to classify whether or not we're actually

trying to answer this question whether or not right now, let's move forward and look how marketing is revolving

around random Forest.

So particularly in marketing we try to identify

the customer churn. So this is particularly

the regression kind of problem right now how let's see so customer churn is nothing but actually

the number of people which are actually

The number of customers who are losing out. So we're going

out of your market. Now you want to identify what will be your customer churn

in near future. So you'll most of them eCommerce Industries are

actually using this like Amazon Flipkart Etc. So they particularly look

at your each Behavior as to what has been your past history. What has been

your purchasing history. What do you like

based on your activity around certain things around

certain ads around certain? Discounts or around certain kind

of materials right? If you like a particular top

your activity will be more around that particular top.

So that is how they track each

and every particular move of yours and then

they try to predict whether you will be

moving out or not. So that is how they identify

the customer churn. So these all are various domains where random Forest

is used and this is not the only list so there

are numerous other examples which are Chile are using random forests that makes

it so special actually. Now, let's move

forward and see how random Forest actually works.

Right. So let us start with the random

Forest algorithm first. Let's just see it step by step as to how random

Forest algorithm works. So the first step is

to actually select certain M features from T. Where m is less than T. So here T is the total number

of the predictor variables that you have in your data set and out of

those total predictor variables. You will select some randomly

some Features out of those now why we are actually selecting

a few features only. The reason is that if you will select all

the predictive variables or the total predictor variables

then each of your decision tree will be same. So the model is not actually

learning something new. It is learning

the same previous thing because all those decision trees

will be similar, right if you actually split

your predicted variables and you select randomly

a few predicted variables only.

Let's say there are 14 total

number of variables and out of those you randomly

pick just three right? So every time you will get

a new decision tree, so there will be variety. Right? So the classification model

will be actually much more intelligent

than the previous one. Now. It has got

barrier to experiences. So definitely it will make

different decisions each time. And then when you will compile

all those different decisions, it will be a new more accurate. An efficient result right? So the first important step

is to select certain number of features out of all

the features now, let's move on to

the second step. Let's say for any node D. Now. The first step is to calculate

the best plate at that point.

So, you know that decision tree how decision trees

actually implemented so you pick up a the most

significant variable right? And then you will split

that particular node into Other child nodes that is how the split

takes place, right? So you will do it

for M number of variables that you have selected. Let's say you

have selected three so you will implement

the split at all. Those three nodes

in one particular decision tree, right the third step

is split up the node into two daughter nodes. So now you can split

your root note into as many notes as you want to put hair

will split our node into 2.2 notes as to this or that so it will be an answer

in terms of You saw that right? Our fourth step will be

to repeat all these 3 steps that we've done previously and we'll repeat

all this splitting until we have reached all

the N number of nodes. Right? So we need to repeat until we have reached

till the leaf nodes of a decision tree. That is how we will do it right

now after these four steps. We will have

our one decision tree.

But random Forest is

actually about multiple. Asian trees. So here our fifth step

will come into the picture which will actually repeat

all these previous steps for D number of times now

hit these the D number of decision trees. Let's say I want to implement

five decision trees. So my first step will be to implement all

the previous steps 5 times. So the head the eye tration is

4/5 number of times right now. Once I have created these five decision trees still

my task is not complete yet. On my final task will be

to compile the results of all these five

different decision trees and I will make a call in the majority

voting right here. As you can see in this picture. I had in different instances. Then I created

n different decision trees. And finally I will compile

the result of all these n different decision trees and I will take my call

on the majority voting right.

So whatever my

majority vote says that will be My final result. So this is basically an overview

of the random Forest algorithm how it actually works. Let's just have a look

at this example to get much better understanding

of what we have learnt. So let's say I have

this data set which consists of four

different instances, right? So basically it consists

of the weather information of previous 14 days right

from D1 tildy 14, and this basically

Outlook humidity and wind is Click gives me

the better condition of those 14 days. And finally I have play which is my target variable

weather match did take place on that particular day

or not right. Now. My main goal is to find out whether the match

will actually take place if I have following

these weather conditions with me on any particular day. Let's say the Outlook

is rainy that day and humidity is high

and the wind is very weak. So now I need to predict whether I will be able

to play in the match.

That they are not. All right. So this is

a problem statement fine. Now, let's see how random Forest

is used in this to sort it out. Now here the first step

is to actually split my entire data set

into subsets here. I have split my entire

14 variables into further smaller subsets right

now these subsets may or may not overlap like there is certain

overlapping between d 1 till D3 and D3 till D6 fine. Is an overlapping of D3 so it might happen

that there might be overlapping so you need not really worry

about the overlapping but you have to make sure that all those subsets are

actually different right? So here I have taken

three different subsets my first subset consists of D1 till D3 Mexican subset

consists of D3 till D6 and methods subset

consists of D7 tildy. Now now I will first be focusing

on my first upset now here, let's say that particular day the Outlook was

Overcast fine if yes, it was overcast

then the probabilities that the match will take place.

So overcast is basically

when your weather is too cloudy. So if that is the condition

then definitely the match will take place and let's say

it wasn't overcast. Then you will consider these

second most probable option that will be the wind and you will make

a decision based on this now whether wind was weak or strong

if wind was weak, then you will definitely

go out and play them. Judge as you would not so

now the final outcome out of this decision

tree will be Play Because here the ratio

between the play and no play is to is to 1 so we get to a certain decision

from a first decision tree. Now, let us look

at the second subset now since second subset has

different number of variables.

So that is why this decision

trees absolutely different from what we saw in our four subsets. So let's say if it was overcast

then you will play the match if It isn't the overcast

in you would go and look out for humidity. Now further. It will get split into two

whether it was high or normal. Now, we'll take the first case if the humidity was high

and when it was week, then you will play

the match else if humidity was high

but wind was too strong, then you would not go out

and play the match right now. Let us look at the second dot

to node of humidity if the humidity was normal. The wind was weak. Then you will definitely go out

and play the match as you want go out

and play the match. So here if you look

at the final result, then the ratio of placed no play

is 3 is to 2 then again. The final outcome

is actually play, right? So from second subset, we get the final

decision of play now, let us look at our third subset which consists of D7

till D9 here if again the overcast is yes,

then you will play a match.

Each else you will go

and check out for humidity. And if the humidity is

really high then you won't play the match else. You will play the match

again the probability of playing the matches. Yes, because the ratio

of no play is Twist one, right? So three different subsets

three different decision trees three different outcomes and one final outcome

after compiling all the results from these three different

decision trees are so I This gives a better perspective

better understanding of random Forest like

how it really works. All right. So now let's just have a look at various features

of random Forest Ray. So the first

and the foremost feature is that it is one of the most accurate

learning algorithms, right? So why it is so because single decision trees

are actually prone to having high variance or Hive bias and on

the contrary actually. M4s, it averages

the entire variance across the decision trees.

So let's say if the variances say

X4 decision tree, but for random Forest, let's say we have

implemented n number of decision trees parallely. So my entire variance

gets averaged to upon and my final variance

actually becomes X upon n so that is how the entire variance

actually goes down as compared to other algorithms. Now second most

important feature is that it works well

for both classification and regression problems and by far I have come

across this is one and the only algorithm which works equally

well for both of them these classification kind

of problem or a regression kind of problem, right? Then it's really runs efficient

on large databases.

So basically it's

really scalable. Even if you work for

the lesser amount of database or if you work for a really

huge volume of data, right? So that's a very

good part about it. Then the fourth most

important point is that it requires almost

no input preparation. Now, why am I saying this is because it has got

certain implicit methods, which actually take care

and All the outliers and all the missing data and you really don't have to

take care about all that thing while you are in the stages

of input preparations. So Random Forest is

all here to take care of everything else and next. Is it performs implicit

feature selection, right? So while we are implementing

multiple decision trees, so it has got implicit method which will automatically pick

up some random features out.

Of all your parameters

and then it will go on and implementing

different decision trees. So for example, if you just give

one simple command that all right, I want to implement

500 decision trees no matter how so Random Forest

will automatically take care and it will Implement all

those 500 decision trees and those all 500 decision trees

will be different from each other and this is because it has

got implicit methods which will automatically

collect different parameters. Out of all the variables

that you have right? Then it can be easily grown

in parallel why it is so because we are actually implementing multiple

decision trees and all those decision trees are running or all those decisions

trees are actually getting implemented parallely. So if you say I want thousand

trees to be implemented. So all those thousand trees are

getting implemented parallely. So that is how the computation

time reduces down. Right, and the last point is that it has got methods

for balancing error in unbalanced it as it's now what exactly

unbalanced data sets are let me just give

you an example of that. So let's say you're working

on a data set fine and you create a random

forest model and get 90% accuracy immediately.

Fantastic you think right. So now you start diving

deep you go a little deeper. And you discovered that 90% of that data actually

belongs to just one class damn your entire data set. Your entire decision

is actually biased to just one particular class. So Random Forest actually

takes care of this thing and it is really not biased towards any particular decision

tree or any particular variable or any class. So it has got methods

which looks after it and they does is all the balance

of errors in your data sets.

So that's pretty much about the features

of random forests. What is KNN algorithm will K. Nearest neighbor

is a simple algorithm that stores all

the available cases and classify the new data or case based

on a similarity measure. It suggests that if you are similar

to your neighbors, then you are one of them, right? For example, if apple looks more similar

to banana orange or Melon. Rather than a monkey rat or a cat then most likely Apple

belong to the group of fruits.

All right. Well in general Cayenne is used

in Search application where you are looking

for similar items that is when your task is

some form of fine items similar to this one. Then you call this search

as a Cayenne search. But what is this KN KN? Well this K denotes the number

of nearest neighbor which are voting class

of the new data or the testing data. For example, if k equal 1 then the testing

data are given the same label as a close this Ample

in the training set similarly if k equal to 3 the labels of the three closes classes

are checked and the most common label is assigned

to then testing data. So this is what a KN KN algorithm means

so moving on ahead. Let's see some

of the example of scenarios where KN is used

in the industry. So, let's see

the industrial application of KNN algorithm starting

with recommender system. Well the biggest use case of cayenne and search

is a recommender system. This recommended system is

like an automated form of a shop counter guy when you

asked him for a product.

Not only shows you the product but also suggest you or displays

your relevant set of products, which are related to the item. You're already interested

in buying this KNN algorithm applies to recommending

products like an Amazon or for recommending media, like in case of Netflix or even

for recommending advertisement to display to a user if I'm not wrong almost all

of you must have used Amazon for shopping, right? So just to tell you more

than 35% of amazon.com revenue is generated by

its recommendation engine. So what's their

strategy Amazon uses? Recommendation as

a targeted marketing tool in both the email campaigns around most of its website Pages Amazon will

recommend many products from different categories based

on what you have browser and it will pull those products

in front of you which you are likely to buy like the frequently

bought together option that comes at the bottom

of the product page to tempt you into buying the combo.

Well, this recommendation

has just one main goal that is increase average

order value or to upsell and cross-sell customers

by providing product suggestion based on items

in the shopping cart, or On the product they are

currently looking at on site. So next industrial

application of KNN algorithm is concept search or searching semantically

similar documents and classifying documents

containing similar topics. So as you know, the data on the Internet

is increasing exponentially every single second. There are billions and billions

of documents on the internet each document on the internet

contains multiple Concepts, that could be

a potential concept. Now, this is a situation where the main problem

is to extract concept from a set of documents as each page could have

thousands of combination that could be potential Concepts

an average document could have millions of concept combined that the vast amount

of data on the web. Well, we are talking

about an enormous amount of data set and Sample.

So what we need is we need

to find the concept from the enormous amount

of data set and samples, right? So for this purpose, we'll be using KNN

algorithm more advanced example could include handwriting

detection like an OCR or image recognization

or even video recognization. All right. So now that you know

various use cases of KNN algorithm, let's proceed and see

how does it work. So how does

a KNN algorithm work? Let's start by plotting

these blue and orange point on our graph. So these Blue Points

the belong to class A and the orange ones

they belong to class B. Now you get a star as a new pony

and your task is to predict whether this new point

it belongs to class A or it belongs to the class B.

So to start the production,

the very first thing that you have to do is

select the value of K, just as I told you KN KN

algorithm refers to the number of nearest neighbors that you

want to select for example, in this case k equal to 3. So what does it mean it means that I am selecting three points which are the least distance

to the new point or you can say I am selecting

three different points which are closest to the star. Well at this point

of time you can ask how will you calculate

the least distance? So once you

calculate the distance, you will get one blue

and two orange points which are closest to this star

now since in this case as we have a majority

of Inch point so you can see that for k equal 3D star

belongs to the class B, or you can say that the star is more similar

to the orange points moving on ahead.

Well, what if k equal

to 6 well for this case, you have to look

for six different points which are closest to this star. So in this case

after calculating the distance, we find that we have

four blue points and two Orange Point which are closest

to the star now, as you can see that the blue points are

in majority so you can say that for k equals

6 this star belongs. These two class A or the star

is more similar to Blue Points. So by now, I guess you know

how a KNN algorithm work. And what is the significance

of gain KNN algorithm. So how will you

choose the value of K? So keeping in mind this case

the most important parameter in KNN algorithm. So, let's see when you build

a k nearest neighbor classifier. How will you choose

a value of K? Well, you might have

a specific value of K in mind or you could divide up

your data and use something like cross-validation technique

to test several values of K in order to determine which works best for your data.

Example if n equal

2,000 cases then in that case the optimal value

of K lies somewhere in between 1 to 19. But yes, unless you try it

you cannot be sure of it. So, you know how the algorithm

is working on a higher level. Let's move on and see how things are predicted

using KNN algorithm. Remember I told you the KNN algorithm uses

the least distance measure in order to find

its nearest neighbors. So let's see

how these distances calculated. Well, there are

several distance measure which can be used. So to start with Will mainly

focus on euclidean distance in Manhattan distance

in this session. So what is

this euclidean distance? Well, this euclidean distance

is defined as the square root of the sum of difference

between a new point x and an existing Point why so for example here we

have Point P1 and P2 Point P. 1 is 1 1 and point B 2 is 5 for so what is the euclidean

distance between both of them? So you can say

that euclidean distance is a direct distance

between two points.

So what is the distance

between the point P1 and P2? So we Calculate it as

5 minus 1 whole square plus 4 minus 1 whole square and we can route it

over which results to 5. So next is

the Manhattan distance. Well, this Manhattan distance is

used to calculate the distance between real Vector using the sum of their absolute

difference in this case.

The Manhattan distance

between the point P1 and P2 is mod of 5 minus 1

plus mod value of 4 minus 1, which results to 3 plus 4. That is 7. So this slide shows

the difference between euclidean and Manhattan distance

from point A to point B. So euclidean distance is

nothing but the direct or the least possible distance

between A and B. Whereas the Manhattan distance

is a distance between A and B measured along the axis

at right angle. Let's take an example and see how things are predicted

using KNN algorithm or how the cannon

algorithm is working suppose.

We have data set

which consists of height weight and T-shirt size

of some customers. Now when a new customer

come we only have is height. And wait as the information

now our task is to predict. What is the T-shirt size

of that particular customer? So for this will be using

the KNN algorithm. So the very first thing

what we need to do, we need to calculate

the euclidean distance. So now that you have a new data

of height 160 one centimeter and weight as 61 kg. So the very first thing that we'll do is we'll calculate

the euclidean distance, which is nothing

but the square root of 160 1 minus 158 whole square plus 61 minus 58 whole square

and square root of that is 4.24. Let's drag and drop it. So these are the various

euclidean distance of other points. Now, let's suppose k equal

to 5 then the algorithm what it does is it searches

for the five customer closest to the new customer that is most similar

to the new data in terms of its attribute for k equal 5.

Let's find the top five

minimum euclidian distance. So these are the distance which we are going

to use one two, three, four and five. So let's rank them

in the order first. This is second. This is third then this one

is Forward and again, this one is five. So there's our order. So for k equal 5 we

have for t-shirts which come under size

M and one t-shirt which comes under size l so obviously best guess

for the best prediction for the T-shirt size of white

161 centimeters and wait 60 1 kg is M. Or you can say

that a new customer fit into size M. Well this was all

about the theoretical session. But before we drill down

to the coding part, let me just tell you why people

call KN as a lazy learner. Well KN for classification. Ocean is a very

simple algorithm. But that's not why they are

called lazy KN is a lazy learner because it doesn't have

a discriminative function from the training data. But what it does it

memorizes the training data, there is no learning phase

of the model and all of the work happens at the time.

Your prediction is requested. So as such there's the reason

why KN is often referred to us lazy learning algorithm. So this was all about

the theoretical session now, let's move on

to the coding part. So for the Practical implementation of

the Hands-On part, I'll be using

the artists data set so This data set consists

of 150 observation. We have four features and one class label

the four features include the sepal length sepal width

petal length and the petrol head whereas the class label

decides which flower belongs to which category. So this was the description

of the data set, which we are using now, let's move on and see

what are the step by step solution

to perform a KNN algorithm. So first, we'll start

by handling the data what we have to do we

have to open the data set from the CSV format and split the data set

into train and test part next.

We'll take the Clarity where we

have to calculate the distance between two data instances. Once we calculate the distance

next we'll look for the neighbor and select K Neighbors which are having the least

distance from a new point. Now once we get our neighbor, then we'll generate a response

from a set of data instances. So this will decide whether the new Point belongs

to class A or Class B. Finally will create

the accuracy function and in the end. We'll tie it all together

in the main function. So let's start with our code for implementing KNN

algorithm using python. I'll be using Java. Old book by Don

3.0 installed on it. Now. Let's move on and see how can an algorithm

can be implemented using python. So there's my jupyter notebook, which is a web-based interactive

Computing notebook environment with python 3.0 installed on it. So the launch its launching so

there's our jupyter notebook and we'll be riding

our python codes on it. So the first thing that we need to do is

load our file our data is in CSV format

without a header line or any code we can open

the file the open function and read the data line

using the reader function.

In the CSV module. So let's write a code

to load our data file. Let's execute the Run button. So once you execute

the Run button, you can see the entire training

data set as the output next. We need to split the data

into a training data set that KN can use to make

prediction and a test data set that we can use to evaluate

the accuracy of the model. So we first need to convert

the flower measure that will load it as

string into numbers that we can work next. We need to split the data set

randomly to train and test. Ratio 67's 233 for test is

to train as a standard ratio, which is used for this purpose. So let's define a function as load data set that loads a CSV

with the provided file named and split it

randomly into training and test data set using

the provided split ratio.

So this is our function load

data set which is using filename split ratio training data set and testing data

set as its input. All right. So let's execute the Run button

and check for any errors. So it's executed

with zero errors. Let's test this function. So there's our training

set testing set load data set. So this is our function

load data set on inside that we are passing. Our file is data

with a split ratio of 0.66 and training data set

and test data set. Let's see what our training data

set and test data set. It's dividing into so

it's giving a count of training data set

and testing data set. The total number

of training data set as split into is

97 and total number of test data set we have is 53.

So total number of training data

set we have here is 97 and total number of test data

set we have here is 53. All right. Okay, so Function load

data set is performing. Well, so let's move

on to step two which is similarity. So in order to make prediction, we need to calculate

the similarity between any two given data instances. This is needed so that we can locate the kamo

similar data instances in the training data set are

in turn make a prediction given that all for flower measurement

are numeric and have same unit.

We can directly use

the euclidean distance measure. This is nothing

but the square root of the sum of squared differences

between two areas of the number given that all the for flower

Are numeric and have same unit we can directly use

the euclidean distance measure which is nothing

but the square root of the sum of squared difference

between two areas or the number additionally

we want to control which field to include

in the distance calculation.

So specifically we only want

to include first for attribute. So our approach will be

to limit the euclidean distance to a fixed length. All right. So let's define

our euclidean function. So this are euclidean

distance function which takes instance

one instance to and length as parameters instance 1 and ends. These two are the two points of which you want to calculate

the euclidean distance, whereas this length and denote that how many attributes

you want to include? Okay. So there's our

euclidean function. Let's execute it. It's executing fine

without any errors. Let's test the function

suppose the data one or the first instance consists

of the data point has two to two and it belongs to class A and data to consist

of four for four and it belongs to class P. So when we calculate

the euclidean distance of data one to data to and what we have to do we

have to consider only first three features of them.

All right. So let's print the distance

as you can see here. The distance comes

out to be three point four six four now like

so this is nothing but the square root

of 4 minus 2 whole Square. So this distance is nothing

but the euclidean distance and it is calculated as square

root of 4 minus 2 whole square plus 4 minus 2 whole square that is nothing but 3

times of 4 minus 2 whole square that is 12 + square root of 12 is nothing

but 3.46 for all right. So now that we have calculated

the distance now we need to look for K nearest. Neighbors now that we

have a similarity measure we can use it to collect

the kamo similar instances for a given unseen instance. Well, this is

a straightforward process of calculating the distance

for all the instances and selecting a subset with

the smallest distance value. And now what we have

to do we have to select the smallest distance values. So for that will be

defining a function as get neighbors.

So for that what we will be doing

will be defining a function as get neighbors what it will do it will return

the K most similar Neighbors From the training set

for a given test instance. All right, so this is

how our get neighbors In look like it takes training data set and test instance

and K as its input here. The K is nothing but the number of nearest neighbor

you want to check for. All right. So basically what

you'll be getting from this get Mabel's

function is K different points having least euclidean distance

from the test instance.

All right, let's execute it. So the function executed

without any errors. So let's test our function. So suppose the training data set

includes the data like to to to and it belongs to class A and other data includes

four four four and it belongs to class P and at testing

and Census 555 or now, we have to predict whether this test instance

belongs to class A or it belongs to class be. All right for k equal 1

we have to predict its nearest neighbor and predict whether this test instance

it will belong to class A or will it belong to class be? Alright. So let's execute the Run button. All right. So an executing

the Run button you can see that we have output

as for for for and be a new instance 5 5 5 is closes 2.44

for which belongs to class be.

All right. Now once you have located

the most similar neighbor for a test instance next task

is to predict a response based on those neighbors. So how we can do that. Well, we can do this by allowing each neighbor

to vote for the class attribute and take the majority vote

as a prediction. Let's see how we can do that. So we are function as getresponse with takes

neighbors as the input. Well, this neighbor was nothing

but the output of this get me / function the output of get me were function

will be fed to get response. All right. Let's execute the Run button. It's executed. Let's move ahead and test

our function get response.

So we have a But as bun bun

bun it belongs to class A 2 2 2 it belongs to class a33. It belongs to class B. So this response,

that's what it will do. It will store the value

of get response by passing this neighbor value. I like so what we want

to check is we want to predict whether that test instance

final outcome will belongs to class A or Class B. When the neighbors are

1 1 1 a 2 2 A + 3 3 B. So, let's check our response. Now that we have created

all the different function which are required

for a KNN algorithm. So important main concern is how do you evaluate

the accuracy of the prediction and easy way to evaluate the accuracy of the model

is to calculate a ratio of the total correct prediction

to all the protection made.

So for this I will

be defining function as get accuracy and inside that I'll be passing

my test data set and the predictions get

accuracy function check get executed without any error. Let's check it

for a sample data set. So we have our test data set as

1 1 1 It belongs to class A 2/2 which again belongs to class

3 3 3 which belongs to class B and my predictions is

for first test data. It predicted latter belongs

to class A which is true for next it predicted

that belongs to class C, which is again to and for

the next again and predictive that it belongs to class A

which is false in this case cause the test data

belongs to class be. All right. So in total we have to correct

prediction out of three. All right, so the ratio

will be 2 by 3, which is nothing but 66.6. So our accuracy rate is 66.6. It's so now that you

have created all the function that are required

for KNN algorithm.

Let's compile them

into one single main function. Alright, so this is

our main function and we are using Iris data set with a split of 0.67 and

the value of K is 3 Let's see. What is the accuracy score

of this check how accurate are modulus so

in training data set, we have a hundred

and thirteen values and then the test data set. We have 37 values. These are the predicted and the actual values

of the output. Okay. So in total we got

an accuracy of 90s. In point two nine percent,

which is really very good. Alright, so I hope the concept

of this KNN algorithm is here device in a world

full of machine learning and artificial intelligence

surrounding almost everything around us classification and prediction is one of the most important aspects

of machine learning.

So before moving forward, let's have a quick look

at the agenda. I'll start off this video

by explaining you guys what exactly is Nave biased then we'll and what

is Bayes theorem which serves as a logic behind the name pass

algorithm going forward. I'll explain the steps involved in the neighbors

algorithm one by one and finally add finish

of this video with a demo on the Nave bass using

the SQL own package noun a bass is a simple but

surprisingly powerful algorithm from penetrative analysis. It is a classification technique

based on base theorem with an assumption of

Independence among predictors. It comprises of two parts,

which is name. And bias in simple terms

neighbors classifier assumes that the presence

of a particular feature in a class is unrelated

to the presence of any other feature, even if this features

depend on each other or upon the existence

of the other features, all of these properties

independently contribute to the probability whether a fruit is an apple

or an orange or a banana.

So that is why it is known as naive now naive

based model is easy to build and particularly useful

for very large data sets. In probability Theory

and statistics based theorem, which is already

known as the base law or the base rule describes

the probability of an event based on prior knowledge

of the conditions that might be related

to the event now paste theorem is a way to figure

out conditional probability. The conditional probability

is the probability of an event happening given that it has some relationship

to one or more other events. For example, your probability

of getting a parking space is connected to the time

of the day you pass.

Where you park and what conventions are you

going on at that time based Serum is slightly

more nuanced in a nutshell. It gives you an actual

probability of an event given information about the tests. Now, if you look

at the definition of Bayes theorem, we can see

that given a hypothesis H and the evidence

e-base term states that the relationship between the

probability of the hypothesis before getting the evidence which is the P of H

and the probability of the hypothesis

after getting the evidence that P of H given e

is defined as probability of e given H into probability of H divided by probability of e

it's rather confusing, right? So let's take an example

to understand this theorem. So suppose I have

a deck of cards and if a single card is drawn

from the deck of playing cards, the probability that the card

is a king is for by 52 since there are four Kings

in a standard deck of 52 cards. Now if King is an event,

this card is a king. The probability of King

is given as 4 by 52 that is equal to 1 by 13. Now if the evidence is provided

for instance someone looks as the That the single card

is a face card the probability of King given that it's a face

can be calculated using the base theorem

by this formula.

The since every King

is also a face card the probability of face given that it's a king is equal to 1 and since there are

three face cards in each suit. That is the chat king and queen. The probability of the face card

is equal to 12 by 52. That is 3 by 30. Now using Bayes theorem we

can find out the probability of King given that it's a face so our final answer

comes to 1 by 3, which is also true. So if you have a deck of cards which has having only faces now

there are three types of phases which are the chat king

and queen so the probability that it's the king is 1 by 3.

Now. This is the simple example

of how based on works now if we look at the proof as in

how this Bayes theorem Evolved. So here we have

probability of a given p and probability of B

given a now for a joint probability distribution

over the sets A and B, the probability of

a intersection B, the conditional probability

of a given B is defined as the probability of a intersection B divided

by probability of B, and similarly probability of B, given a is defined as

probability of B intersection a divided by probability

of a now we can Equate probability of

a intersection p and probability of B intersection a as

both are the same thing now from this method as you can see, we get our final

base theorem proof, which is the probability of a

given b equals probability of B, given a into probability of P divided by

the probability of a now while this is the equation that applies to

any probability distribution over the events A and B.

It has a particular nice

interpretation in case where a is represented

as the hypothesis h and B is represented as some observed evidence e

in that case the formula is p of H given e is equal to P of e given H into probability of H divided

by probability of e now this relates the probability of hypotheses before

getting the evidence, which is p of H

to the probability of the hypothesis

after getting the evidence which is p of H given e for this reason P of H is known

as the prior probability while P of Each given e is known

as the posterior probability and the factor that relates the two is known as

the likelihood ratio Now using this term space theorem

can be rephrased as the posterior

probability equals. The prior probability

times the likelihood ratio. So now that we know the maths which is involved

behind the baster. Mm. Let's see how we can implement

this in real life scenario.

So suppose we have a data set. In which we have

the Outlook the humidity and we need to find out whether we should play

or not on that day. So the Outlook can be

sunny overcast rain and the humidity high normal and the wind are categorized

into two phases which are the weak

and the strong winds. The first of all will create

a frequency table using each attribute of the data set. So the frequency table

for the Outlook looks like this we have Sunny overcast

and rainy the frequency table of humidity looks like this and Frequency table of when

looks like this we have strong and weak for wind and high

and normal ranges for humidity. So for each frequency table, we will generate

a likelihood table now now the likelihood table

contains the probability of a particular day

suppose we take the sunny and we take the play as yes and no so the probability

of Sunny given that we play yes is 3 by 10, which is 0.3 the

probability of X, which is the

probability of Sunny Is equal to 5 by 14 now, these are all the terms which are just generated

from the data which we have a and finally the probability

of yes is 10 out of 14.

So if we have a look

at the likelihood of yes given that it's a sunny we

can see using Bayes theorem. It's the probability

of Sunny given yes into probability of s divided

by the probability of Sunny. So we have all

the values here calculated. So if you put

that in our base serum equation, we get the likelihood of yes. A 0.59 similarly the likelihood of no can also be calculated

here is 0.40 now similarly. We are going to create

the likelihood table for both the humidity and the win there's a for humidity the likelihood

for yes given the humidity is high is equal to 0.4

to and the probability of playing know

given the vent is high is 0.58. The similarly for table wind

the probability of he has given that the wind is week is 0.75

and the probability of no given that the win is week is 0.25

now suppose we have of day which has high rain which has high humidity

and the wind is weak. So should we play or not? That's our for that? We use the base theorem

here again the likelihood of yes on that day is equal to the probability

of Outlook rain given that it's a yes into probability

of Magic given that say yes, and the probability of

when that is we given that it's we are playing yes

into the probability of yes, which equals to zero

point zero one nine and similarly the likelihood

of know on that day is equal to zero point zero one six.

Now if we look at the probability

of yes for that day of playing we just

need to divide it with the likelihood

some of both the yes and no so the probability

of playing tomorrow, which is yes is 5 whereas the probability

of not playing is equal to 0.45. Now. This is based upon the data

which we already have with us. So now that you have an idea

of what exactly is named bias how it works and we have seen how it can be implemented

on a particular data set. Let's see where it

is used in the industry. The started with our first

industrial use case, which is news categorization

or we can use the term text classification

to broaden the spectrum of this algorithm news in the web are rapidly growing

in the era of Information Age where each new site has

its own different layout and categorization

for grouping news. Now these heterogeneity of layout and categorization

cannot always satisfy individual users need

to remove these heterogeneity and classifying

the news articles. Owing to the user preference

is a formidable task companies use web crawler

to extract useful text from HTML Pages

the news articles and each of these news articles is then tokenized now

these tokens are nothing but the categories

of the news now in order to achieve

better classification result.

We remove the less

significant Words, which are the stop was

from the documents or the Articles and then we apply

the Nave base classifier for classifying the news

contents based on the news. Now this is by far one of the best examples

of Neighbors classifier, which is Spam filtering. Now. It's the Nave

Bayes classifier are a popular statistical technique

for email filtering. They typically use bag

of words features to identify at the spam email and approach commonly used

in text classification as well. Now it works by correlating

the use of tokens, but the spam and non-spam emails

and then the Bayes theorem, which I explained earlier is used to

calculate the probability that an email is or not a Spam so named

by a Spam filtering is a baseline technique

for dealing with Spam that container itself to the emails need

of an individual user and give low false positive

spam detection rates that are generally

acceptable to users.

It is one of the oldest ways

of doing spam filtering with its roots in the 1990s particular words

have particular probabilities of occurring in spam. And and legitimate email

as well for instance. Most emails users will frequently encounter

the world lottery or the lucky draw a spam email, but we'll sell them

see it in other emails. The filter doesn't know

these probabilities in advance and must be friends.

So it can build them

up to train the filter. The user must manually indicate whether a new email is Spam

or not for all the words in each straining email. The filter will

adjust the probability that each word will appear

in a Spam or legitimate. Owl in the database now after training the word

probabilities also known as the likelihood functions are

used to compute the probability that an email with a particular

set of words as in in belongs to either category each word in the email contributes

the email spam probability. This contribution is called

the posterior probability and is computed again

using the base 0 then the email spam probability is computed over all

the verse in the email and if the total exceeds

a certain threshold say Or 95% the filter will Mark

the email as spam. Now object detection is

the process of finding instances of real-world objects

such as faces bicycles and buildings in images or video now object detection algorithm typically

use extracted features and learning algorithm to recognize instance of

an object category here again, a bass plays an important

role of categorization and classification of object

now medical area.

This is increasingly voluminous

amount of electronic data, which are becoming more

and more complicated. The produced medical data

has certain characteristics that make the analysis

very challenging and attractive as well among all

the different approaches. The knave bias is used. It is the most effective

and efficient classification algorithm and has

been successfully applied to many medical problems

empirical comparison of knave bias versus

five popular classifiers on Medical data sets shows that may bias is well suited

for medical application and has high performance in most

of the examine medical problems. Now in the past various

testicle methods have been used for modeling in the area

of disease diagnosis. These methods require

prior assumptions and are less capable of dealing with massive and complicated

nonlinear and dependent data one of the main advantages

of neighbor as approach which is appealing

to Physicians is that all the available

information is used? To explain the decision

this explanation seems to be natural for medical

diagnosis and prognosis.

That is it is very

close to the way how physician diagnosed patients

now weather is one of the most influential factor

in our daily life to an extent that it may affect

the economy of a country that depends on occupation

like agriculture. Therefore as a countermeasure

to reduce the damage caused by uncertainty

in whether Behavior, there should be an efficient way

to print the weather now whether projecting

has Challenging problem in the meteorological department since ears even

after the technology skill and scientific

advancement the accuracy and protection of weather

has never been sufficient even in current day this domain

remains as a research topic in which scientists and mathematicians are working

to produce a model or an algorithm that will accurately

predict the weather now a bias in approach

based model is created by where posterior probabilities

are used to calculate the likelihood of

each class label for input.

Data instance and the one

with the maximum likelihood is considered as the resulting

output now earlier. We saw a small implementation

of this algorithm as well where we predicted whether we should play

or not based on the data, which we have collected earlier. Now, this is a python Library which is known as scikit-learn

it helps to build in a bias and model in Python. Now, there are three types

of named by ass model under scikit-learn Library.

The first one is the caution. It is used in classification

and it Assumes that the feature follow

a normal distribution. The next we have is multinomial. It is used for discrete counts. For example, let's say we have

a text classification problem and here we

consider bernouli trials, which is one step further and instead of word

occurring in the document. We have count how often word occurs in the document you

can think of it as a number of times

outcomes number is observed in the given number of Trials. And finally we have

the bernouli type. Of neighbors. The binomial model is useful if your feature vectors are

binary bag of words model where the once and the zeros are words occur

in the document and the verse which do not occur in the document respectively

based on their data set. You can choose any of

the given discussed model here, which is the gaussian

the multinomial or the bernouli.

So let's understand

how this algorithm works. And what are

the different steps? One can take to create

a bison model and use knave bias to predict the output so

here to understand better. We are going to predict

the onset of diabetes Now this problem comprises of 768 observations

of medical details for Pima Indian patients. The record describes

instantaneous measurement taken from the patient such as

the age the number of times pregnant and the blood work crew now all

the patients are women aged 21 and Older and all

the attributes are numeric and the unit's vary

from attribute to attribute. Each record has

a class value that indicate whether the patient suffered

on onset of diabetes within five years

are the measurements. Now. These are classified as 0 now. I've broken the whole process

down into the following steps.

The first step

is handling the data in which we load the data

from the CSV file and split it into training and test it as it's the second step

is summarizing the data. In which we summarize the properties in the training

data sets so that we can calculate the probabilities

and make predictions. Now the third step comes is

making a particular prediction. We use the summaries of the data set to generate

a single prediction. And after that we generate

predictions given a test data set and a summarized

training data sets. And finally we evaluate the accuracy of the predictions

made for a test data set as the percentage correct

out of all the predictions made and finally We tied

together and form.

Our own model

of nape is classifier. Now. The first thing we need to do

is load our data the data is in the CSV format

without a header line or any codes. We can open the file

with the open function and read the data lines

using the read functions in the CSV module. Now, we also need

to convert the attributes that were loaded as

strings into numbers so that we can work with them. So let me show you how this can be implemented now

for that you need to Tall python on a system and use

the jupyter notebook or the python shell. Hey, I'm using

the Anaconda Navigator which has all the things required to do

the programming in Python.

We have the Jupiter lab. We have the notebook. We have the QT console. Even we have a studio as well. So what you need to do is just

install the Anaconda Navigator it comes with the pre

installed python also, so the moment you click launch

on The jupyter Notebook. It will take you

to the Jupiter homepage in a local system and here you

can do programming in Python. So let me just rename it as

by my India diabetes. So first, we need

to load the data set. So I'm creating here a function

load CSV now before that. We need to import

certain CSV the math and the random method. So as you can see, I've created a load CSV function which will take the pie

my Indian diabetes data dot CSV file using

the CSV dot read a method and then we are converting

every element of that data set into float originally all

the ants are in string, but we need to convert

them into floor for all calculation purposes.

The next we need to split

the data into training data sets that nay bias can use

to make the prediction and this data set that we can use to evaluate

the accuracy of the model. We need to split the data

set randomly into training and testing data set

in the ratio of usually which is 7230. But for this example, I'm going to use 67 and 33 now 70 and 30 is a Ratio

for testing algorithms so you can play around

with this number. So this is our split

data set function. Now the Navy base

model is comprised of summary of the data

in the training data set. Now this summary is then used

while making predictions. Now the summary

of the training data collected involves the mean

the standard deviation of each attribute

by class value now, for example, if there are two class values

and seven numerical attributes, then we need a mean and the standard deviation for

each of these seven attributes and the class value which makes The 14

attributes summaries so we can break the preparation of this summary down

into the following sub tasks which are the separating data

by class calculating mean calculating standard deviation

summarizing the data sets and summarizing

attributes by class.

So the first task is to separate the training data set

instances by class value so that we can calculate

statistics for each class. We can do that by creating a map of each class value

to a list of instances that belong to the class. Class and sort the entire

dataset of instances into the appropriate list. Now the separate

by class function just the same. So as you can see

the function assumes that the last attribute

is the class value the function returns a map

of class value to the list of data instances next. We need to calculate

the mean of each attribute for a class value. Now, the mean is the central middle or

the central tendency of the data and we use it as a middle

of our gaussian distribution when Calculating

the probabilities.

So this is our function

for mean now. We also need to calculate

the standard deviation of each attribute

for a class value. The standard deviation

is calculated as a square root of the variance and the variance

is calculated as the average of the squared differences for each attribute value from the mean now

one thing to note that here is that we are using

n minus one method which subtracts one from the number

of attributes values when calculating the variance.

Now that we have the tools

to summarize the data for a given list of instances. We can calculate the mean

and standard deviation for each attribute. Now that's if function groups

the values for each attribute across our data instances

into their own lists so that we can compute the mean

and standard deviation values for each attribute. Now next comes the summarizing

attributes by class. We can pull it all together

by first separating. Our training data sets

into instances groped by class then calculating the summaries

for each a Should be now. We are ready to make predictions

using the summaries prepared from our training data making patients involved

calculating the probability that a given data instance

belong to each class then selecting the class with the largest probability

as a prediction. Now we can divide this whole

method into four tasks which are the calculating

gaussian probability density function calculating class

probability making a prediction and then estimating the accuracy now to calculate the gaussian

probability density function.

We use the gaussian function

to estimate the probability of a given attribute value

given the node mean and the standard deviation

of the attribute estimated from the training data. As you can see

the parameters RX mean and the standard deviation now in the calculate

probability function, we calculate the exponent first

then calculate the main division this lets us fit the equation

nicely into two lines. Now, the next task is calculating the

class properties now that we had can calculate

the probability of an attribute belonging to a class. We can combine the probabilities

of all the attributes values for a data instance

and come up with a probability of the entire.

Our data instance

belonging to the class. So now that we have calculated

the class properties. It's time to finally make

our first prediction now, we can calculate the probability

of the data instance belong to each class value and we can look

for the largest probability and return the associated class and for that we are going

to use this function predict which uses the summaries and the input Vector which is

basically all the probabilities which are being input

for a particular label now finally we can

An estimate the accuracy of the model

by making predictions for each data instances

in our test data for that.

We use the get

predictions method. Now this method is used to calculate the predictions

based upon the test data sets and the summary

of the training data set. Now, the predictions

can be compared to the class values

in our test data set and classification accuracy

can be calculated as an accuracy ratio

between the zeros and the hundred percent. Now the get accuracy method will

calculate this accuracy ratio. Now finally to sum it all up.

We Define our main function

we call all these methods which we have defined

earlier one by one to get the Courtesy of the model

which we have created. So as you can see, this is our main function

in which we have the file name. We have defined the split ratio. We have the data set. We have the training

and test data set. We are using the split

data set method next. We are using the summarized

by class function using the get protection and

the get accuracy method as well. So guys as you can see

the output of this one gives us that we are splitting

the 768 Rose into 514 which is the training and 254 which is the test data set rows

and the accuracy of this model is 68% Now we can play

with the amount of training and test data sets

which are to be used so we can change

the split ratio to seventies.

238 is 220 to get

different sort of accuracy. So suppose I change

the split ratio from 0.67 20.8. So as you can see, we get the accuracy

of 62 percent. So splitting it into 0.67

gave us a better result which was 68 percent. So this is how you can Implement

Navy bias caution classifier. These are the step

by step methods which you need to do in case of

using the Nave Bayes classifier, but don't worry. We do not need to write

all this many lines of code to make a model

this with the second. And I really comes into picture

the scikit-learn library has a predefined method or as say a predefined

function of nape bias, which converts all

of these lines, of course into merely just

two or three lines of codes.

So, let me just open

another jupyter notebook. So let me name it

as sklearn a pass. Now here we are going to use

the most famous data set which is the iris De Casa. Now, the iris flower data

set is a multivariate data set introduced by

the British statistician and biologists Roland Fisher and based on this fish is linear

discriminant model this data set became a typical test case for many statistical

classification techniques in machine learning. So here we are going to use

the caution NB model, which is already available

in the sklearn. As I mentioned earlier, there were three

types of Neighbors which are the question

multinomial and the bernouli.

So here we are going to use

the caution and be model which is already present

in the SK loan Library, which is the cycle in library. So first of all, what we need to do is

import the sklearn data sets and the metrics and we also need

to import the caution NB Now once all these libraries are lowered we need

to load the data set which is the iris dataset. The next what we need

to do is fit a Nave by a smaller to this data set.

So as you can see we have so

easily defined the model which is the gaussian

NB which contains all the programming which I just showed you

earlier all the methods which are taking the input

calculating the mean the standard deviation

separating it bike last and finally making predictions. Calculating the

prediction accuracy. All of this comes

under the caution and be method which is inside already present

in the sklearn library.

We just need to fit it

according to the data set which we have so next if we print the model we see

which is the gaussian NB model. The next what we need to do

is make the predictions. So the expected output

is data set dot Target and the projected

is using the pretend model and the model we are using

is the cause in N be here. Now to summarize the model which created we calculate

the confusion Matrix and the classification report. So guys, as you can see

the classification to provide we have the Precision

of Point Ninety Six, we have the recall of 0.96. We have the F1 score and the support and finally if

we print our confusion Matrix, as you can see it gives

us this output.

So as you can see

using the gaussian and we method just

putting it in the model and using any of the data. fitting the model which you created

into a particular data set and getting the desired

output is so easy with the scikit-learn library as we Mo support

Vector machine is one of the most effective

machine learning classifier and it has been used

in various Fields such as face recognition

cancer classification and so on today's session is dedicated to how svm works

the various features of svm and how it Is used

in the real world.

All right. Okay. Now let's move on and see

what svm algorithm is all about. So guys s VM or support Vector machine is

a supervised learning algorithm, which is mainly used to classify

data into different classes now unlike most algorithms svm

makes use of a hyperplane which acts like

a decision boundary between the various classes in general svm can

be used to generate multiple separating hyperplanes so that the data

is Divided into segments.

Okay, and each of these segments will contain

only one kind of data. It's mainly used for classification purpose

wearing you want to classify or data into two different

segments depending on the features of the data. Now before moving any further, let's discuss a few

features of svm. Like I mentioned earlier svm is

a supervised learning algorithm. This means that svm trains on a set of labeled data svm

studies the label training data and then classifies

any new input Data, depending on what it learned in the training phase

a main advantage of support Vector machine is that it can be used

for both classification and regression problems. All right. Now even though svm is mainly

known for classification the svr which is the support

Vector regressor is used for regression problems.

All right, so svm can be used

both for classification. And for regression. Now, this is one of the reasons

why a lot of people prefer svm because it's a very

good classifier and along That it is also

used for regression. Okay. Another feature is the svm

kernel functions svm can be used for classifying nonlinear data by using the kernel trick

the kernel trick basically means to transform your data

into another dimension so that you can easily

draw a hyperplane between the different

classes of the data. Alright, nonlinear data

is basically data which cannot be separated

with a straight line. Alright, so svm can even be used

on nonlinear data sets. You just have to use

a A kernel functions to do this.

All right. So guys, I hope

you all are clear with the basic concepts of svm. Now, let's move on

and look at how svm works so there's an order

to understand how svm Works let's consider a small scenario

now for a second pretend that you own a firm. Okay, and let's say

that you have a problem and you want to set up a fence

to protect your rabbits from the pack of wolves. Okay, but where do you build your films

one way to get around? The problem is to build

a classifier based. On the position of the rabbits

and words in your pasture. So what I'm telling you is

you can classify the group of rabbits as one group and draw a decision

boundary between the rabbits and the world correct.

So if I do that and if I try

to draw a decision boundary between the rabbits

and the Wolves, it looks something like this. Okay. Now you can clearly build

a fence along this line in simple terms. This is exactly how SPM work it draws

a decision boundary, which is a hyperplane between any New classes

in order to separate them or classify them now. I know you're thinking

how do you know where to draw a hyperplane the basic principle behind

svm is to draw a hyperplane that best separates

the two classes in our case the two glasses

of the rabbits and the Wolves. So you start off by drawing

a random hyperplane and then you check the distance

between the hyperplane and the closest data points from each Club these closes

on your is data points to the hyperplane are known

as support vectors.

And that's where the name comes

from support Vector machine. So basically the

hyperplane is drawn based on these support vectors. So guys an optimal

hyperplane will have a maximum distance from each

of these support vectors. All right. So basically the hyperplane

which has the maximum distance from the support vectors is

the most optimal hyperplane and this distance

between the hyperplane and the support vectors

is known as the margin. All right, so to sum it up svm

is used to classify data. By using a hyper plane such

that the distance between the hyperplane and

the support vectors is maximum. So basically your margin

has to be maximum.

All right, that way, you know that you're actually

separating your classes or add because the distance between

the two classes is maximum. Okay. Now, let's try

to solve a problem. Okay. So let's say that I input

a new data point. Okay. This is a new data point and now I want to draw

a hyper plane such that it best separates

the two classes. Okay, so I start off

by drawing a hyperplane. Like this and then

I check the distance between the hyperplane

and the support vectors. Okay, so I'm trying to check if the margin is maximum

for this hyper plane, but what if I draw a hyperplane

which is like this? All right. Now I'm going to check

the support vectors over here. Then I'm going

to check the distance from the support vectors and for

this hyperplane, it's clear that the margin is more red. When you compare the margin

of the previous one to this hyperplane.

It is more. So the reason why I'm choosing

this hyperplane is because the Distance

between the support vectors and the hyperplane is maximum

in this scenario. Okay. So guys, this is

how you choose a hyperplane. You basically have to make sure that the hyper plane

has a maximum. Margin. All right, it has to best

separate the two classes. All right. Okay so far it was quite easy. Our data was linearly separable which means that you

could draw a straight line to separate the two classes. All right, but what will you do? If the data set is like this you possibly can't draw

a hyperplane like Is on it, it doesn't separate

the two classes at all.

So what do you do in such situations now earlier

in the session I mentioned how a kernel can be used

to transform data into another dimension that has a clear dividing margin

between the classes of data. Alright, so kernel functions

offer the user this option of transforming nonlinear spaces

into linear ones. Nonlinear data set is the one that you can't separate

using a straight line. All right. In order to deal

with such data sets, you're going to transform them

into linear data sets and then use svm on them. Okay. So simple trick would be

to transform the two variables X and Y into a new

feature space involving a new variable called Z. All right, so guys so far

we were plotting our data on two dimensional space. Correct? We will only using the X and the y axis so we had only

those two variables X and Y now in order to deal with this kind

of data a simple trick. Be to transform

the two variables X and Y into a new feature space

involving a new variable called Z. Okay, so we're basically

visualizing the data on a three-dimensional space.

Now when you transform

the 2D space into a 3D space you can clearly see

a dividing margin between the two classes

of data right now. You can go ahead

and separate the two classes by drawing the best

hyperplane between them. Okay, that's exactly what we discussed

in the previous slides. So guys, why don't you

try this yourself dried. Drawing a hyperplane, which is the most Optimum

for these two classes. All right, so guys, I hope you have

a good understanding about nonlinear svm's now. Let's look at a real

world use case if support Vector machines. So guys s VM as a classifier has been used

in cancer classification since the early 2000s.

So there was an experiment held

by a group of professionals who applied svm in a colon

cancer tissue classification. So the data set

consisted of about Transmembrane protein samples and only about 50 to 200

genes samples were input Into the svm classifier Now this sample which was input into the svm classifier had

both colon cancer tissue samples and normal colon tissue

samples right now.

The main objective of this study

was to classify Gene samples based on whether they

are cancerous or not. Okay, so svm was trained

using the 50 to 200 samples in order to discriminate

between non-tumor from A tumor specimens. So the performance of the svm classifier

was very accurate for even a small data set. All right, we had only

50 to 200 samples and even for the small data set

svm was pretty accurate with this results. Not only that its

performance was compared to other classification

algorithms like naive Bayes and in each case svm

outperform naive Bayes. So after this experiment

it was clear that svm classified the data

more effectively and it worked exceptionally good. Small data sets. Let's go ahead and understand what exactly

is unsupervised learning. So sometimes the given data

is unstructured and unlabeled so it becomes difficult

to classify the data into different categories.

So unsupervised learning

helps to solve this problem. This learning is used

to Cluster the input data and classes on the basis

of their statistical properties. So example, we can cluster Different Bikes based

upon the speed limit there. Acceleration or the average that they are giving so and suppose learning is a type

of machine learning algorithm used to draw inferences from beta sets consisting

of input data without labeled responses. So if you have a look

at the workflow or the process flow

of unsupervised learning, so the training data is

collection of information without any label. We have the machine

learning algorithm and then we have

the clustering malls. So what it does is that distributes the data

into a different class. And again, if you provide

any unreliable new data, it will make a prediction and find out to which cluster

that particular data or the data set belongs to or the particular data point

belongs to so one of the most important algorithms in unsupervised

learning is clustering.

So let's understand exactly

what is clustering. So a clustering basically is the process

of dividing the data sets into groups consisting

of similar data points. It means grouping

of objects based on the information found in

the data describing the object. Objects or their relationships so clustering malls focus on and defying groups

of similar records and labeling records according to the group to which

they belong now this is done without the benefit

of prior knowledge about the groups

and their characteristics. So and in fact, we may not even know exactly

how many groups are there to look for. Now. These models are often

referred to as unsupervised learning models, since there's no external

standard by which to judge. One is

classification performance. There are no right or wrong

answers to these model. And if we talk about why

clustering is used so the goal of clustering

is to determine the intrinsic group in a set

of unlabeled data sometime. The partitioning is the goal or the purpose of clustering

algorithm is to make sense of and exact value from the last set of structured

and unstructured data.

So that is why clustering

is used in the industry and if you have a look at the video, These use cases of clustering

in the industry. So first of all,

it's being used in marketing. So discovering distinct groups in customer databases

such as customers who make a lot

of long-distance calls customers who use internet more than cause they're also

using insurance companies.

So like I need to find groups

of Corporation insurance policy holders with high average

claim rate Farmers crash cops, which is profitable. They are using C Smith studies

and defined problem areas of Oil or gas exploration

Based on seesmic data, and they're also used

in the recommendation of movies. If you would say they

are also used in Flickr photos. They also used by Amazon for recommending the product

which category it lies in. So basically if we talk about clustering there are

three types of clustering. So first of all, we have the exclusive clustering which is the hard clustering

so here and item belongs exclusively to one cluster

not several clusters and the data point. Along exclusively

to one cluster. So an example of this is

the k-means clustering so k-means clustering

does this exclusive kind of clustering so secondly, we have overlapping clustering so it is also known as

soft clusters in this and item can belong to multiple clusters as

its degree of association with each cluster

is shown and for example, we have fuzzy

or the c means clustering which is being used

for overlapping clustering and finally we have

The hierarchical clustering so when two clusters have

a parent-child relationship or a tree-like structure, then it is known

as hierarchical cluster.

So as you can see here

from the example, we have a parent-child kind of relationship in

the cluster given here. So let's understand what exactly is

K means clustering. So k-means clustering is

an algorithm whose main goal is to group similar elements

of data points into a cluster and it is the process by which objects are classified

into a predefined number. Of groups so that they

are as much dissimilar as possible from one group

to another group but as much as similar or

possible within each group now if you have a look at the

algorithm working here, right? So first of all, it starts with and defying

the number of clusters, which is k then I can we find the centroid we find

the distance objects to the distance object to the centroid distance

of object to the centroid then we find the Dropping based on the minimum distance has

the centroid Converse if true then we make

a cluster false.

We then I can't find

the centroid repeat all of the steps

again and again, so let me show you how exactly clustering was

with an example here. So first we need

to decide the number of clusters to be made now

another important task here is how to decide the important

number of clusters or how to decide the number

of clusters really get into that later. So first, let's assume that the number Number

of clusters we have decided is 3 so after that then

we provide the centroids for all the Clusters which is guessing and the algorithm calculates

the euclidean distance of the point from each centroid and assigns the data point to the closest cluster

now euclidean distance.

All of you know

is the square root of the distance the square root

of the square of the distance. So next when the center

is a calculated again, we have our new clusters

for each data point. And again the distance

from the points to the new clusters

are calculated and then again, the points are assigned

to the closest cluster. And then again, we have the new centroid scattered and now

these steps are repeated until we have

a repetition the centroids or the new center eyes are very

close to the very previous ones. So antenna and less

output gets repeated or the outputs are

very very close enough.

We do not stop this process. We keep on calculating

the euclidean distance. It's of all the points

to the centroids. Then we calculate

the new centroids and that is how clay means

clustering Works basically, so an important part

here is to understand how to decide the value of K

or the number of clusters because it does

not make any sense. If you do not know how many classes

are you going to make? So to decide

the number of clusters, we have the elbow method. So let's assume first of all compute

the sum squared error, which is the sse4 some value. A for example, let's take two four six

and eight now the SS e which is the sum squared

is defined as a sum of the squared distance

between each number member of the cluster and its centroid

mathematically and if you mathematically it

is given by the equation which is provided here.

And if you brought

the key against the SSE, you will see

that the error decreases as K gets large now this is because the number

of cluster increases they should be smaller. So does this torsion is

also smaller know the idea of the elbow method is

to choose the K at which the SSC decreases abruptly. So for example here if we have a look

at the figure given here. We see that the best number

of cluster is at the elbow as you can see here the graph

here changes abruptly after number four. So for this particular example, we're going to use

for as a number of cluster. So first of all while working with k-means clustering

there are two key points, As to know first of all

be careful about various start. So choosing the first center at random choosing

the second center that is far away from the first

center similarly choosing the NIH Center as far away

as possible from the closest of the all the other centers and the second idea

is to do as many runs of k-means each with different

random starting points so that you get an idea

of where exactly and how many clusters

you need to make and where exactly the centroid lies.

And how the data

is getting confused now k-means is not exactly

a very good method. So let's understand the pros and

cons of clay means clusterings. We know that k-means is simple

and understandable. Everyone loves you that the first go

the items automatically assigned to the Clusters. Now if we have

a look at the cons, so first of all one needs to

define the number of clusters, there's a very

heavy task asks us if we have 3/4 or if we have 10 categories

and if we do not know what the number

of clusters are going to be. It's Difficult for anyone

to you know to guess the number of clusters not all items

are forced into clusters whether they are actually belong

to any other cluster or any other category, they are forced to to lie in that other category

in which they are closest to this against happens

because of the number of clusters with not defining

the correct number of clusters or not being able to guess

the correct number of clusters.

So and most of all it's unable to handle the noisy data and

the outliners because anyway, As machine learning engineers and data scientists

have to clean the data. But then again it comes down to the analysis watch they

are doing and the method that they are using so typically

people do not clean the data for k-means clustering even if the clean there's

sometimes a now see noisy and outliners data

which affect the whole model so that was all

for k-means clustering. So what we're going to do

is now use k-means clustering for the We data set so we have to find out

the number of clusters and divide it accordingly.

So the use case is

that first of all, we have a data set

of five thousand movies. And what you want

to do is grip them if the movies into clusters

based on the Facebook likes, so guys, let's have a look

at the demo here. So first of all, what we're going to do is

import deep copy numpy pandas Seaborn the various libraries, which we're going to use now

and from map popular videos.

In the use ply plot, and we're going to use

this ggplot and next what we're going to do

is import the data set and look at the shape

of the data is it so if we have a look at the

shape of the data set we can see that it has 5043 rows

with Twenty Eight columns. And if you have a look at the head of the data set we

can see it has 5043 data points, so What we're going to do

is place the data points in the plot me take

the director Facebook likes and we have a look

at the data columns face number and post cars total Facebook likes director

Facebook likes.

So what we have done here now is taking the director

Facebook likes and the actor three Facebook likes, right. So we have five thousand

forty three rows and two columns Now using

the k-means from sklearn what we're going

to do is import it. First we're going

to import k-means from sklearn dot cluster. Remember guys Escalon is

a very important library in Python for machine learning. So and the number of cluster what we're going to do is

provide as five now this again, the number of cluster

depends upon the SSE, which is the sum

of squared errors or the we're going

to use the elbow method.

So I'm not going to go

into the details of that again. So we're going to fit the data

into the k-means to fit and if you find the cluster, Us then for the

k-means and printed. So what we find is is

an array of five clusters and Fa print the label

of the Caymans cluster. Now next what we're going

to do is plot the data which we have with the Clusters

with the new data clusters, which we have found

and for this we're going to use the si bon and

as you can see here, we have plotted that car. We have plotted

the data into the grid and You can see here. We have five clusters.

So probably what I would say is that the cluster

3 and the cluster zero are very very close. So it might depend

see that's exactly what I was going to say. Is that initially

the main Challenge and k-means clustering is

to define the number of centers which are the K. So as you can see here that the third Center and the zeroth cluster

the third cluster and the zeroth cluster up

very very close to each other so It probably could have been

in one another cluster and the another disadvantage was that we do not exactly know how the points are

to be arranged. So it's very difficult to force

the data into any other cluster which makes our analysis

a little different works fine. But sometimes it

might be difficult to code in the k-means clustering now, let's understand what exactly

is seems clustering. So the fuzzy c means is an extension of the k-means

clustering the popular simple. Clustering technique so

fuzzy clustering also referred as soft clustering is a form of clustering in which

each data point can belong to more than one cluster.

So k-means tries to find

the heart clusters where each point belongs

to one cluster. Whereas the fuzzy c means

discovers the soft clusters in a soft cluster

any point can belong to more than one cluster at a time with

a certain Affinity value towards each 4zc means assigns

the degree of membership, which Just from 0 to 1

to an object to a given cluster. So there is a stipulation

that the sum of the membership of an object to all the cluster. It belongs to must be equal

to 1 so the degree of membership of this particular point to pull

of these clusters as 0.6 0.4. And if you add up we get 1 so that is one of the logic

behind the fuzzy c means so and and this Affinity

is proportional to the distance from the point to the center

of the cluster now then again Now we have the pros

and cons of fuzzy see means. So first of all, it allows a data point to be

in multiple cluster. That's a pro.

It's a more neutral

representation of the behavior of jeans jeans usually are

involved in multiple functions. So it is a very

good type of clustering when we're talking

about genes First of and again, if we talk about the cons again, we have to Define c

which is the number of clusters same as K next. We need to determine the

membership cutoff value also, so that takes a lot of Time

and it's time-consuming and the Clusters are sensitive to initial

assignment of centroid. So a slight change or deviation from the center

has it's going to result in a very different

kind of, you know, a funny kind of output we get

from the fuzzy c means and one of the major disadvantage

of see means clustering is that it's this are

non deterministic algorithm.

So it does not give you a particular output

as in such that's that now let's have a look. At the third type of clustering which is

the hierarchical clustering. So hierarchical clustering

is an alternative approach which builds a hierarchy

from the bottom up or the top to bottom and does not require

to specify the number of clusters beforehand. Now, the algorithm works

as in first of all, we put each data point

in its own cluster and if I the closest to Cluster and combine them into one more

cluster repeat the above step till the data points are

in a single cluster. Now, there are two types of

hierarchical clustering one is I've number 80 plus string and the other one

is division clustering. So a commemorative

clustering bills the dendogram from bottom level while the division clustering

it starts all the data points in one cluster

the fruit cluster now again hierarchical clustering also

has some sort of pros and cons.

So in the pros

don't know Assumption of a particular number

of cluster is required and it may correspond

to meaningful taxonomist. Whereas if we talk

about the cons once a decision is made

to combine two clusters. Has it cannot be undone and one of the major disadvantage of

these hierarchical clustering is that it becomes very slow. If we talked about very very

large data sets and nowadays. I think every industry are using

last year as its and collecting large amounts of data. So hierarchical clustering

is not the act or the best method someone

might need to go for so there's that now when we talk

about unsupervised learning, so we have K means

clustering and again, Another important term which people usually Miss while

talking about us was running and there's one very

important concept of Market Basket analysis.

Now, it is one

of the key techniques used by large retailers

to uncover association between items now

it works by looking for combination of items that occur together frequently in the transactions

to put it in other way. It allows retailers

to identify the relationships between the items that the People by

for example people who buy bread also tend to buy

butter the marketing team at the retail stores

should Target customers who buy bread and butter

and provide them and offer so that they buy

a third item like an egg.

So if a customer buys bread

and butter and sees a discount or an offer on X, he will be encouraged to spend

more money and buy the eggs. Now, this is what Market Basket

analysis is all about now to find the association

between the two items and make predictions about

what the customers will buy. There are two Cartoons which are

the association rule Mining and the ebrary algorithms. So let's discuss each

of these algorithm with an example. First of all, if we have a look at

the association rule mining now, it's a technique that's shows how items are associated to

each other for example customers who purchased spread have

a 60 percent likelihood of also purchasing

jam and customers who purchase laptop are more

likely to purchase laptop bags. Now if you take an example

of an association rule if we have a look

at the Example here a arrow B. It means that if a person buys an atom a then

he will also buy an atom P.

Now. There are three common ways to

measure a particular Association because we have to find

these rules not on the basis of some statistics, right? So what we do is use support confidence and lift

now these three common ways and the measures to have a look at the association rule

Mining and know exactly how good is that rule. So first of all, we have support So support

gifts the fraction of the Which contains an item A and B. So it's basically

the frequency of the item in the whole item set. Where's confidence gifts how often the item

A and B occurred together given the number

of item given the number of times a occur. So it's frequency

a comma B divided by the frequency of a now left what indicates is the strength

of the rule over the random co-occurrence of A and B.

If you have a close look

at the denominator of the lift formula here, we have support a into support

be and now a major thing which can be noted from this is that the support of A

and B are independent here. So if the value of lift or the denominator value

of the lift is more it means that the items are independently

selling more not together. So that in turn will decrease

the value of lift. So what happens is that suppose the value

of lift is more that implies that the rule which we get. It implies that the rule

is strong and it And we used for later purposes

because in that case the support in to support P value, which is the denominator

of lift will be low which in turn means that there is a relationship

between the items in the and B. So let's take an example

of Association rule Mining and understand how

exactly it works.

So let's suppose we have

a set of items a b c d and e and we have

the set of transactions which are T1 T2, T3, T4 and T5 and what we need to do is

create some sort of Rules, for example, you can see a d which means that

if a person buys a he buys D if a person by see he buys a

if a person buys a he by C. And for the fourth one is if a person by B

and C Hill in turn by a now what we need to do is calculate

the support confidence and lift of these rules now here again, we talked about

a priority algorithm.

So a priori algorithm and the association rule mining

go hand in hand. So what a predator

This algorithm. It uses the frequent itemsets to

generate the association rules and it is based on the concept that a subset of a frequent itemsets must also

be a frequent Isom set. So let's understand what is a frequent item set and

how all of these work together. So if we take the following

transactions of items, we have transaction T

1 2 T 5 and the items are 1 3 4 2 3 5 1 2 3 5 2 5 and 1 3 5 now. Now another more

important thing about support which I forgot to mention was that when talking about Association rule mining

there is a minimum support count what we need to do.

Now. The first step is

to build a list of items that of size 1 using

this transaction data set and use the minimum

support count to now, let's see how we do

that if we create the table see when you have a close look

at the table c 1 we have the items at one

which has support three because it appears

in the transaction one. Three and five similarly if you have a look at the item

set the single item 3. So it has the support

of for it appears in t 1 T 2 T 3 and T 5 but if we have a look at the item

set for it only appears in the transaction once so it's support value is

1 now the item set with the support value Which is less

than the minimum support value that is to have

to be eliminated.

So the final table which is a table F1

has one two three. And five it does not

contain the for now. What we're going to do is

create the item list of the size 2 and all the combination

of the item sets in F1. I used in this iteration. So we're left for behind. We just have 1 2 3 & 5. So the possible item

sets a 1 2 1 3 1 5 2 3 2 5 & 3 5 then again. We will calculate the support So in this case if we have

a closer look at the table c 2 we see

that the items at once.

What to do is having a support value 1

which has to be eliminated. So the final table f 2 does

not contain 1 comma 2 similarly if we create the item sets of size 3 and calculate

this support values, but before calculating

the support, let's perform the puring on the data set. Now what's appearing? So after all the combinations

are made we divide the table c 3 items to check if there are another subset

whose support is less than the minimum support value. This is a prairie algorithm.

So in the item sets one, two, three what we can see

that we have one two, and in the one to five again, we have one too so build this

cardboard of these item sets and we'll be left

with 1 3 5 and 2 3 5. So with one three five, we have three subsets

one five one, three three five, which are present in table F2. Then again. We have two three

to five and 3/5 which are also present

in t will f 2 so we have 2 Move 1 comma

2 from the table c 3 and create the table F3 now if you're using the items of C3

to create the atoms of C-4. So what we find is that we have the item set

1 2 3 5 the support value is 1 Which is less than

the minimum support value of 2. So what we're going

to do is stop here and we're going to return

to the previous item set. That is the table

c 3 so the final table. Well, if three was

one three five with the support value of 2 and 2 3 5

with the support value of 2 now, what we're gonna do is

generate all the subsets of each frequent itemsets.

So let's assume that minimum confidence value is

60% So for every subset s of I the output rule is

that s gives i2s is that s recommends i ns. If the support of I / support

of s is greater than or equal. Equal to the minimum

confidence value, then only will proceed further. So keep in mind that we have not used

left till now. We are only working

with support and confidence. So applying rules

with item sets of F3 we get rule 1 which is 1 comma

3 which gives 1 3 5 and 1/3. It means if you buy one

and three there's a 66% chance that you will buy item 5 also similarly the rule

1 comma 5 it means that If you buy one and five, there's a hundred percent chance that you will buy

three also similarly if we have a look

at Rule 5 and 6 here the confidence value is less than 60 percent which was

the assumed confidence value. So what we're going to do is

with reject these files now an important thing

to note here is that have a closer look

to the Rule 5 and root 3, you see it has one five three

one five three three point five.

It's very confusing. So one thing to keep in Mine is that the order of the item sets

is also very important that will help us

allow create good rules and avoid any kind of confusion. So that's that. So now let's learn how Association rule I used in

Market Basket analysis problems. So what we'll do

is we will be using the online transactions data of a retail store for

generating Association rules. So first of all, what you need to do is

import pandas MSD ml. D&D libraries from the imported

and read the data. So first of all, what we're

going to do is read the data, what we're going

to do is from M LX T and E dot frequent patterns. We're going to improve the a

priori and Association rules.

As you can see here. We have the head of the data. You can see we have invoice

number stock code the description quantity the invoice dt8 unit price

customer ID and the country. So in the next step, what we will do is we

will do the data cleanup which includes removing. His from some

of the descriptions given and what we're going

to do is drop the rules that do not have the invoice numbers every move

the crate transactions. So hey, what what you're

going to do is remove which do not have

any invoice number if the string tight ainst Epstein was a number then

we're going to remove that.

Those are the credits remove

any kind of spaces from the descriptions. So as you can see here, we have like five hundred

and thirty-two thousand rows with eight columns. So next one. We're going to do is

after the cleanup. We need to consolidate the items

into one transaction per row with each product for the sake

of keeping the data set small. We're going to only look

at the sales for France. So we're going to use

the only France and group by invoice number description with the quantity sum

up and see so which leaves us

with three ninety two rows and one thousand five

hundred sixty three columns.

Now, there are a lot

of zeros in the data, but we also need to make sure

Any positive values are converted to a 1 and

anything less than 0 is set to 0 so for that we're going

to use this code defining and code units if x is less than

0 return 0 if x is greater than 1 returned one. So what we're going to do is map and apply it to the whole data

set we have here. So now that we

have structured data properly. So the next step is to generate

the frequent item set that has support of at

least seven percent. Now this number is chosen so

that you can get close enough. Now, what we're going

to do is generate the rules with the corresponding

support confidence and lift. So we had given

the minimum support a 0.7. The metric is left

frequent Island set and threshold is 1 so these are the following rules now a few

rules with a high lift value, which means that it

occurs more frequently than would be expected

given the number of transaction the product combinations most

of the places the confidence.

Is high as well. So these are few to observations

what we get here. If we filter the data frame

using the standard pandas code for large lift six

and high confidence 0.8. This is what the output

is going to look like. These are 1 2 3 4 5 6 7 8. So as you can see here, we have the H rules

which are the final rules which are given by

the Association rule Mining and this is how all

the industries are.

Are any of these we've talked

about largely retailers. They tend to know how their products are used

and how exactly they should rearrange and provide

the offers on the product so that people spend

more and more money and time in the shop. So that was all

about Association rule mining. So so guys, that's all for

unsupervised learning. I hope you got to know

about the different formulas how unsupervised learning works

because you know, we did not provide

any label to the data. All we did was create some rules

and not knowing what the data is and we did clusterings

different types of clusterings came in simi's

hierarchical clustering.

The reinforcement learning

is a part of machine learning where an agent is put

in an environment and he learns to behave

in this environment by performing certain actions. Okay, so it basically performs

actions and it either gets a rewards on the actions or It gets a punishment

and observing the reward which it gets from those actions

reinforcement learning is all about taking an appropriate

action in order to maximize the reward

in a particular situation. So guys in supervised learning

the training data comprises of the input and the expected output and so the model is trained

with the expected output itself, but when it comes

to reinforcement learning, there is no expected output here

the reinforcement agent decides.

What actions to take in order

to perform a given task. In the absence of a training

data set it is bound to learn from its experience itself. All right. So reinforcement learning

is all about an agent who's put in

an unknown environment and he's going to use a hit

and trial method in order to figure out the environment and then come up

with an outcome. Okay. Now, let's look

at reinforcement learning within an analogy.

So consider a scenario

where in a baby is learning how to walk the scenario

can go about in two ways. Now in the first case

the baby starts walking and makes it to the candy here. The candy is basically

the reward it's going to get so since the candy is the end goal. The baby is happy. It's positive. Okay, so the baby is happy

and it gets rewarded a set of candies now another way

in which this could go is that the baby starts walking but Falls due to some hurdle

in between the baby gets hurt and it doesn't get any candy

and obviously the baby is sad. So this is a negative reward. Okay, or you can say

this is a setback. So just like how we humans learn from our mistakes

by trial and error.

Learning is also similar. Okay, so we have an agent which is basically

the baby and a reward which is the candy over here. Okay, and with many hurdles

in between the agent is supposed to find the best possible path

to read through the reward. So guys, I hope

you all are clear with the reinforcement learning. Now. Let's look at the

reinforcement learning process.

So generally a reinforcement

learning system has two main components. All right, the first is an agent

and the second one is an environment now

in the previous case, we saw that the

agent was a baby. B and the environment

was the living room where in the baby was crawling. Okay. The environment is the setting that the agent is acting

on and the agent over here represents the reinforcement

learning algorithm. So guys the reinforcement

learning process starts when the environment

sends a state to the agent and then the agent

will take some actions based on the observations in turn the environment

will send the next state and the respective reward

back to the agent. The agent will update its knowledge with the reward

returned by the I meant and it uses that to evaluate

its previous action. So guys this

Loop keeps continuing until the environment sends

a terminal state which means that the agent has

accomplished all his tasks and he finally gets the reward. Okay. This is exactly what was depicted

in this scenario. So the agent keeps

climbing up ladders until he reaches his reward

to understand this better.

Let's suppose that our agent is

learning to play Counter-Strike. Okay, so let's break it down

now initially the RL agent which is Only the player player

1 let's say it's the player 1 who is trying to learn

how to play the game. Okay. He collects some state

from the environment. Okay. This could be the first state

of Counter-Strike now based on the state the agent

will take some action. Okay, and this action

can be anything that causes a result. So if the player moves left or right it's also

considered as an action. Okay. So initially the action

is going to be random because obviously the first time

you pick up Counter-Strike, you're not going

to be a master at it. So you're going to try

with different actions and you're just going

to Up a random action in the beginning.

Now the environment is going

to give a new state. So after clearing that the environment

is now going to give a new state to the agent or to the player. So maybe he's

across stage 1 now. He's in stage 2. So now the player will get a reward

our one from the environment because it cleared stage 1. So this reward can be anything. It can be additional points

or coins or anything like that. Okay. So basically this Loop

keeps going on until the player is dead

or reaches the destination. Okay, and it Continuously

outputs a sequence of States actions and rewards. So guys. This was a small example to show you how reinforcement

learning process works. So you start

with an initial State and once a player clothes

that state he gets a reward after that the environment will

give another stage to the player and after it clears that state

it's going to get another reward and it's going to keep happening until the player

reaches his destination.

All right, so guys,

I hope this is clear now, let's move on and look at the reinforcement

learning definition. So there are a few Concepts

that you should be aware of while studying

reinforcement learning. Let's look at those

definitions over here. So first we have the agent

now an agent is basically the reinforcement learning

algorithm that learns from trial and error. Okay. So an agent takes actions, like for example a soldier

in Counter-Strike navigating through the game.

That's also an action. Okay, if he moves left right

or if he shoots at somebody that's also an action. Okay. So the agent is responsible for taking actions

in the environment. Now the environment is

the whole Counter-Strike game. Okay. It's basically the world

through which the agent moves the environment takes

the agents current state and action as input and it Returns the agency reward

and its next state as output. Alright, next we have action

now all the possible steps that an agent can take

are called actions. So like I said, it can be moving right left

or shooting or any of that. Alright, then we have

state now state is basically the current condition

returned by the environment. So Double State you are in if you are in state 1 or

if you're interested to that represents

your current condition. All right. Next we have reward a reward

is basically an instant return from the environment

to appraise Your Last Action. Okay, so it can be

anything like coins or it can be additional points.

So basically a reward

is given to an agent after it clears. The specific stages. Next we have policy policy is

basically the strategy that the agent uses to find

out his next action. In based on his current

state policy is just the strategy with which

you approach the game. Then we have value. Now while you is the expected

long-term return with discount so value and action value

can be a little bit confusing for you right now. But as we move further, you'll understand what

I'm talking about. Okay, so value is basically

the long-term return that you get with discount. Okay discount, I'll explain

in the further slides. Then we have action value now action value

is also known as Q value. Okay, it's very similar

to what You except that it takes

an extra parameter, which is the current action. So basically here you'll find

out the Q value depending on the particular action

that you took. All right. So guys don't get confused

with value and action value.

We look at examples in the further slides and you

will understand this better. Okay, so guys make sure that you're familiar

with these terms because you'll be seeing

a lot of these terms in the further slides. All right. Now before we move any further, I'd like to discuss

a few more Concepts. Okay. So first we will discuss

the reward maximization. So if you haven't already

realize the it the basic aim of the RL agent is

to maximize the reward now, how does that happen? Let's try to understand

this in depth. So the agent must be

trained in such a way that he takes the best action

so that the reward is maximum because the end goal

of reinforcement learning is to maximize your reward

based on a set of actions. So let me explain this

with a small game now in the figure you can see there

is a Forks there's some meat and there's a tiger So

odd agent is basically the fox and his end goal is to eat

the maximum amount of meat before being eaten

by the tiger now since the fox is a clever fellow he eats the meat that is closer to him

rather than the meat which is closer to the tiger.

Now this is because the

closer he is to the tiger the higher are his chances

of getting killed. So because of this the rewards

which are near the tiger, even if they are

bigger meat chunks, they will be discounted. So this is exactly

what discounting means so our agent is not going

to eat the meat chunks which are Closer to the tiger

because of the risk. All right now even though the meat chunks

might be larger. He does not want to take

the chances of getting killed. Okay. This is called discounting. Okay. This is where you discount because it improvised

and you just eat the meat which are closer to you

instead of taking risks and eating the meat which are closer

to your opponent.

All right. Now the discounting

of reward Works based on a value called gamma

will be discussing gamma in our further slides, but in short the value

of gamma is between 0 and 1. Okay. So the Follow the gamma. The larger is

the discount value. Okay. So if the gamma value is lesser, it means that the agent

is not going to explore and he's not going

to try and eat the meat chunks which are closer to the tiger. Okay, but if the gamma value

is closer to 1 it means that our agent is actually

going to explore and it's going to dry

and eat the meat chunks which are closer to the tiger. All right now, I'll be explaining this

in depth in the further slides. So don't worry if you haven't got

a clear concept yet, but just understand

that reward maximized. Ation is a very important step when it comes

to reinforcement learning because the agent has

to collect maximum rewards by the end of the game.

All right. Now, let's look

at another concept which is called exploration

and exploitation. So exploration like the name

suggests is about exploring and capturing more information

about an environment on the other hand exploitation

is about using the already known exploited information

to hide in the rewards. So guys consider the fox

and tiger example that we discussed now here the foxy Only the meat chunks

which are close to him, but he does not eat

the meat chunks which are closer to the tiger. Okay, even though they

might give him more Awards. He does not eat them if the fox only focuses

on the closest rewards, he will never reach

the big chunks of meat.

Okay, this is

what exploitation is about you just going to use

the currently known information and you're going

to try and get rewards based on that information. But if the fox decides

to explore a bit, it can find the bigger award

which is the big chunks of meat. This is exactly

what exploration is. So the agent is not going

to stick to one corner instead. He's going to explore

the entire environment and try and collect bigger rewards. All right, so guys, I hope you all are clear with

exploration and exploitation.

Now, let's look

at the markers decision process. So guys, this is basically

a mathematical approach for mapping a solution in

reinforcement learning in a way. The purpose of reinforcement

learning is to solve a Markov decision process. Okay, so there are

a few parameters. Was that I used to get

to the solution. So the parameters include

the set of actions the set of states the rewards the policy that you're taking to approach

the problem and the value that you get. Okay, so to sum it up

the agent must take an action a to transition from a start state

to the end State s while doing so the agent

will receive a reward are for each action that he takes. So guys a series of actions taken by

the agent Define the policy or a defines the approach. And the rewards that are collected

Define the value. So the main goal here is

to maximize the rewards by choosing the optimum policy.

All right. Now, let's try to understand

this with the help of the shortest path problem. I'm sure a lot of you might

have gone through this problem when you are in college, so guys look

at the graph over here. So our aim here is

to find the shortest path between a and d

with minimum possible cost. So the value that you see

on each of these edges basically denotes the cost. So if I want to go from A to see

it's gonna cost me 15 points. Okay. So let's look at

how this is done. Now before we move

and look at the problem in this problem the set of

states are denoted by the nodes, which is ABCD and the action is to Traverse

from one node to the other. So if I'm going from A to B, that's an action

similarly a to see that's an action.

Okay, the reward is

basically the cost which is represented

by each Edge over here. All right. Now the policy is basically

the path that I choose to reach the destination, so Let's say I choose

a seed be okay, that's one policy

in order to get to D and choosing a CD

which is a policy. Okay. It's basically how

I'm approaching the problem. So guys here you

can start off at node a and you can take baby steps

to your destination. Now initially you're clueless so you can just take

the next possible node, which is visible to you. So guys, if you're smart enough, you're going to choose a

to see instead of ABCD or ABD.

All right. So now if you are at nodes

see you want to drive. String note D. You must again

choose a weisbarth. All right, you just have

to calculate which path has the highest cost or which path will give

you the maximum rewards. So guys, this is

a simple problem. We just trying to calculate

the shortest path between a and d by traversing

through these nodes. So if I Traverse from a CD,

it gives me the maximum reward. Okay, it gives me 65, which is more than any other

policy would give me. Okay. So if I go from ABD, it would be 40 when you

compare this to a CD. It gives me more reward. So obviously I'm going

to go with a CB. Okay, so guys was

a simple problem in order to understand how

Markov decision process works. All right, so guys,

I want to ask you a question.

What do you think? I did hear did I perform

exploration or did I perform exploitation now the policy for the above example

is of exploitation because we didn't explore

the other nodes. Okay. We just selected three notes

and we travel through them. So that's why this

is called exploitation. We must always explore

the different notes so that we Find

a more optimal policy. But in this case, obviously a CD has

the highest reward and we're going with a CD but

generally it's not so simple. There are a lot of nodes there

hundreds of notes you Traverse and there are like

50 60 policies. Okay, 50 60 different policies. So you make sure you explore

through all the policies and then decide

on an Optimum policy which will give you

a maximum reward the for a robot and environment is a place where It has been

put to use now.

Remember this reward is itself the agent for example

an automobile Factory where a robot is used

to move materials from one place to another now

the task we discussed just now have a property in common. Now, these tasks involve

and environment and expect the agent to learn

from the environment. Now, this is where traditional

machine learning phase and hence the need

for reinforcement learning now. It is good to have

an established overview of the problem. That is to be Of using

the Q learning or the reinforcement learning so it helps to define

the main components of a reinforcement

learning solution. That is the agent environment

action rewards and States. So let's suppose we are to build a few autonomous robots for

an automobile building Factory. Now, these robots will help

the factory personal by conveying them

the necessary parts that they would need

in order to pull the car.

Now. These different parts

are located at nine different positions

within the factory warehouse. The car part include

the chassis Wheels dashboard the engine and so on and the factory workers

have prioritized the location that contains the body or the chassis to be

the topmost but they have provided the priorities

for other locations as well, which will look into the moment. Now these locations within the factory look

somewhat like this. So as you can see here, we have L1 L2 L3 all of these stations now

one thing you might notice here that there Little obstacle

prison in between the locations. So L6 is the top

priority location that contains the chassis

for preparing the car bodies. Now the task is

to enable the robots so that they can find

the shortest route from any given location to

another location on their own. Now the agents in this case

are the robots the environment is the automobile

Factory warehouse.

So let's talk about the state's

the states are the location in which a particular robot is And in the particular

instance of time, which will denote it states

the machines understand numbers rather than let us so let's map

the location codes to number. So as you can see here, we have mapped location

l 1 to this t 0 L 2 and 1 and so on we have L8 as

state 7 and L line at state. So next what we're going to talk

about are the actions. So in our example, the action will be

the direct location that a robot can go from a particular location

right considering What that is a tel to location and the Direct locations to

which it can move rl5 L1 and L3.

Now the figure here may come

in handy to visualize this now as you might have already

guessed the set of actions here is nothing but the set of all possible states of the robot for each location

the set of actions that a robot can take

will be different. For example, the set

of actions will change if the robot is

in L1 rather than L2. So if the robot is Is

in L1 it can only go to L 4 and L 2 directly now that we are done with the states

and the actions. Let's talk about the rewards. So the states are

basically zero one two, three four and the

actions are also 0 1 2 3 4 up to 8. Now. The rewards now will

be given to a robot. If a location which is the state

is directly reachable from a particular location. So let's take an example suppose

L line is directly reachable from L8, right? If a robot goes from LA

to align and vice versa, it will be rewarded by one and if I look a shin is

not directly reachable from a particular equation.

We do not give any reward

a reward of 0 now the reward is just a number and nothing else it enables

the robots to make sense of the movements helping them in deciding what locations

are directly reachable and what are not now with this Q. We can

construct a reward table which contains all

the required values mapping between all possible States. So as you can see here

in the table the positions which are marked green

have a positive reward. And as you can see here, we have all the possible rewards

that a robot can get by moving in between the different states. Now comes an

interesting decision. Now remember that the factory

administrator prioritized L6 to be the topmost. So how do we incorporate

this fact in the above table. Now, this is done by associating

the topmost priority location with a very high reward

than the usual ones. So let's put 990. And in the cell L 6 comma and 6 now the table of rewards

with a higher reward for the topmost location

looks something like this.

We have not formally defined

all the vital components for the solution. We are aiming for

the problem discussed. Now, you will shift gears

a bit and study some of the fundamental concepts that Prevail in the world

of reinforcement learning and q-learning the first

of all we'll start with the Bellman

equation now consider the following Square rooms, which is analogous

to the actual environment. Aunt from our original problem, but without the barriers now

suppose a robot needs to go to the room marked in the green promise

current position a using the specified Direction now, how can we enable the robot

to do this programmatically one idea would be introduced

some kind of a footprint which the robot will be able

to follow now here a constant value is specified

in each of the rooms which will come

along the robots way if it follows the direction

specified above now in this way if it starts at A it

will be able to scan through this constant value and will move accordingly

but this will only work if the direction is prefix and the robot always starts at the location a now

consider the robot starts at this location rather

than its previous one.

Now the robot

now sees Footprints in two different directions. It is therefore unable

to decide which way to go in order to get the destination

which is the Green Room. It happens primarily because the robot

does not have a weight. Remember the directions

to proceed so our job now is to enable

the robot with a memory. Now, this is where the Bellman

equation comes into play.

So as you can see here, the main reason

of the Bellman equation is to enable the reward

with the memory. That's the thing

we're going to use. So the equation goes

something like this V of s gives maximum a r

of s comma a plus gamma of vs – where s is a particular state which is a ROM a is

the Action Moving between the rooms as – is the state to which

the robot goes from s and gamma is the discount Factor now we'll get

into it in a moment and obviously R of s comma

a is a reward function which takes a state as an action

a and outputs the reward now V of s is the value of being

in a particular state which is the footprint now we consider all

the possible actions and take the one that yields

the maximum value now, there is one constraint however

regarding the value Footprint, that is the row marked in the yellow just

below the Green Room. It will always have

the value of 1 to denote that is one of the nearest room adjacent to the Green Room

not this is also to ensure that a robot gets a reward when it goes from a yellow room

to The Green Room.

Let's see how to make

sense of the equation which we have here. So let's assume

a discount factor of 0.9 as remember gamma is

the discount value or the discount Factor. So let's take a 0.9

now for the room, which is Just below the one or the yellow room, which is

the Aztec Mark for this room. What will be the V of s that is the value of being

in a particular state? So for this V of s

would be something like maximum of a will take 0 which is the initial

of our s comma. Hey plus 0.9

which is gamma into 1 that gives us zero point

nine now here the robot will not get any reward for going to a state

marked in yellow. Hence the ER s comma a is 0 here but the robot knows the value

of being in the yellow room.

Hence V of s Dash is

one following this for the other states. We should get 0.9 then again, if we put 0.9 in this equation, we get 0.81 than 0.7 to 9

and then we again reach the starting point. So this is how the table looks with

some value Footprints computed from the Bellman equation now

a couple of things to It is here is that the max function

has the robot to always choose the state that gives it the maximum value

of being in that state. Now the discount Factor

gamma notifies the robot about how far it is

from the destination. This is typically specified by

the developer of the algorithm. That would be installed

in the robot. Now, the other states can also

be given their respective values in a similar way. So as you can see here

the boxes adjacent to the green one have one and if we Move away from 1 we

get 0.9 0.8 1 0 1 7 to 9 and finally we reach 0.66.

Now the robot now

can precede its way through the Green Room utilizing

these value Footprints event if it's dropped

at any arbitrary room in the given location now, if a robot Lance up in

the highlighted Sky Blue Area, it will still find

two options to choose from but eventually either

of the parts will be good enough for the robot to take because Auto V

the value for prints and only that out.

Now one thing to note is that the Bellman equation is one

of the key equations in the world of reinforcement

learning and Q learning. So if we think realistically our

surroundings do not always work in the way we expect

there is always a bit of stochastic City

involved in it. So this applies

to robot as well. Sometimes it might so happen that the robots

Machinery got corrupted. Sometimes the robot may come

across some hindrance on its way which it

may not be known to it beforehand. Right and sometimes even

if the robot knows that it needs to take

the right turn it will not so how do we introduce

this to cast a city in our case now here comes

the Markov decision process. So consider the robot is

currently in the Red Room and it needs to go

to the green room.

Now. Let's now consider

the robot has a slight chance of dysfunctioning and might take

the left or the right or the bottom turn instead

of digging the upper turn and are Get to the Green Room

from where it is now, which is the Retro. Now the question is, how do we enable the robot

to handle this when it is out in the given environment right. Now, this is a situation where the decision making regarding which turn is

to be taken is partly random and partly another control

of the robot now partly random because we are not sure when exactly the robot mind

dysfunctional and partly under the control of the robot because it is still making a decision of taking

a turn right on its own.

And with the help

of the program embedded into it. So a Markov decision process is a discrete time

stochastic Control process. It provides a mathematical

framework for modeling decision-making in situations where the outcomes

are partly random and partly under the control

of the decision maker. Now we need to give this concept a mathematical shape

most likely an equation which then can be taken further. Now you might be surprised that we can do this with the

help of the Bellman equation. Action with a few minor tweaks. So if we have a look

at the original Bellman equation V of X is equal to maximum of our s comma

a plus gamma V of s – what needs to be changed

in the above equation so that we can introduce

some amount of Randomness here as long as we are not sure when the robot might not take

the expected turn.

We are then also not sure

in which room it might end up in which is nothing but the ROM it moves from its current room

at this point according. To the equation. We are not sure of the a stash which is the next state

or the room, but we do know all the probable

turns the robot might take now in order to incorporate each of this probabilities

into the above equation. We need to associate

a probability with each of the turns to

quantify the robot. If it has got any expertise

chance of taking the stern know if we do so we get

PS is equal to maximum of RS comma a plus gamma

into summation of s – PS comma a comma s stash into V

of his stash now the PS a– and a stash is the probability of moving from room s

to establish with the action a and the submission

here is the expectation of the situation. That's a robot in curse, which is the randomness now,

let's take a look at this example here. So when we associate the probabilities to each

of these terms Owns, we essentially mean

that there is an 80% chance that the robot will

take the upper turn.

Now, if you put all

the required values in our equation, we get V of s is equal

to maximum of R of s comma a + comma of 0.8 into V of room up plus zero point 1 into V

of room down 0.03 into Rome of V of from left plus 0.03

into V of room right now note that the value footprints. Not change due to the fact that we are incorporating

stochastically here.

But this time we

will not calculate those values Footprints instead. We will let the robot

to figure it out. Now up until this point. We have not considered

about rewarding the robot for its action of going

into a particular room. We are only watering the robot when it gets

to the destination now, ideally there should be a reward

for each action the robot takes to help it better assess

the quality of the actions, but the there was need

not to be always be the same but it is much better

than having some amount of reward for the actions

than having no rewards at all. Right and this idea is known as

the living penalty in reality. The reward system

can be very complex and particularly modeling

sparse rewards is an active area of research in the domain

of reinforcement learning.

So by now we have got

the equation which we have a so what we're going to do is

now transition to Q learning. So this equation gives

us the value of going to a particular State

taking the stochastic city of the environment into account. Now, we have also learned

very briefly about the idea of living penalty which deals with associating

each move of the robot with a reward. So Q learning processes and idea of assessing

the quality of an action that is taken to move to

a state rather than determining the possible value of the state which is being moved

to so earlier. We had 0.8 into V. E

of s 1 0.03 into V of S 2 0 point 1 into V

of S 3 and so on now if you incorporate the idea

of assessing the quality of the action for moving

to a certain state so the environment

with the agent and the quality of the action

will look something like this.

So instead of 0.8 V of s 1 will have q of s

1 comma a one will have q of S 2 comma 2 Q of S 3 now

the robot now has food. In states to choose from

and along with that there are four different actions also for

the current state it is in so how do we calculate Q of s comma a that is the cumulative quality

of the possible actions the robot might take so

let's break it down. Now from the equation V of s

equals maximum a RS comma a + comma summation s –

PSAs – into V of s – if we discard them. Maximum function we have is

of a plus gamma into summation p and v now essentially

in the equation that produces V of s. We are considering

all possible actions and all possible States

from the current state that the robot is in and then we are taking the maximum value caused

by taking a certain action and the equation produces

a value footprint, which is for just

one possible action.

In fact, we can think

of it as the quality of the So Q of s comma a

is equal to RS comma a + comma of summation p and v now that we have got an equation

to quantify the quality of a particular action. We are going to make

a little adjustment in the equation we can now say that V of s is the maximum

of all the possible values of Q of s comma a right. So let's utilize this fact and replace V of s Dash as

a function of Q. So Q U.s. Comma a becomes R of s comma a +

comma of summation PSAs – and maximum of the que es – a – so the equation of V is now

turned into an equation of Q, which is the quality. But why would we do that now? This is done to

ease our calculations because now we have

only one function Q which is also the core of the

dynamic programming language. We have only one. Ocean Q to calculate and R of s comma a is

a Quantified metric which produces reward

of moving to a certain State.

Now, the qualities of the actions are

called The Q values and from now on we will refer

to the value Footprints as the Q values

an important piece of the puzzle is

the temporal difference. Now temporal difference

is the component that will help the robot

calculate the Q values which respect to the changes

in the environment over time. So consider Our robot is

currently in the mark State and it wants to move

to the Upper State. One thing to note that here is that the robot already knows

the Q value of making the action that is moving through

the Upper State and we know that the environment

is stochastic in nature and the reward that the robot will get

after moving to the Upper State might be different

from an earlier observation. So how do we capture

this change the real difference? We calculate the new q s comma a

with the same formula and subtract the Previously

known qsa from it. So this will in turn

give us the new QA. Now the equation that we just derived gifts

the temporal difference in the Q values which further helps

to capture the random changes in the environment which may impose now

the name q s comma a is updated as the following

so Q T of s comma is equal to QT minus 1 s comma a plus Alpha D DT of a comma s now here Allah Alpha is

the learning rate which controls how quickly the robot adapts

to the random changes imposed by the environment the qts comma

is the current state q value and a QT minus 1 s comma is

the previously recorded Q value.

So if we replace the TDS comma a

with its full form equation, we should get Q T of s

comma is equal to QT – 1 of s comma y plus Alpha into R of s

comma a plus gamma maximum. Q s Dash a dash minus QT minus 1 s comma a now that we have all the little

pieces of q line together.

Let's move forward

to its implementation part. Now, this is the final equation

of q-learning, right? So, let's see how we can implement this

and obtain the best path for any robot to take now

to implement the algorithm. We need to understand

the warehouse location and how that can be mapped

to different states. So let's start by reconnecting

the sample environment. So as you can see here, we have L1 L2 L3 to align

and as you can see here, we have certain borders also. So first of all, let's map each of the above

locations in the warehouse two numbers or the states so that it will ease

our calculations, right? So what I'm going to do is

create a new Python 3 file in the jupyter notebook

and I'll name it as q-learning. Number. Okay. So let's define the states. But before that what we

need to do is import numpy because we're going to use numpy for this purpose and let's

initialize the parameters.

That is the gamma

and Alpha parameters. So gamma is 0.75 which is the discount Factor

whereas Alpha is 0.9, which is the learning rate. Now next what we're going to do

is Define the states and map it to numbers. So as I mentioned Earlier l

1 is 0 and Dylan line. We have defined the states

in the numerical form. Now. The next step is to define

the actions which is as mentioned above

represents the transition to the next state. So as you can see here, we have an array

of actions from 0 to 8.

Now, what we're going to do

is Define the reward table. So as you can see,

it's the same Matrix that we created just now that I showed you just now now

if you understood it correctly, there isn't any real

Barrel limitation as depicted in the image, for example, the transitional

for tell one is allowed but the reward will be

zero to discourage that path or in tough situation.

What we do is add

a minus 1 there so that it gets

a negative reward. So in the above code snippet

as you can see here. I took each of the states and

put once in the respective state that are directly reachable

from the certain State now, if you refer to that reward

table, once again, what we created the above, our reconstruction will

be easy to understand but one thing to note here is that we did not consider the top

priority location L6 yet. We would also need

an inverse mapping from the state's back

to its original location and it will be cleaner when we reach to the utter

depths of the algorithms. So for that what we're going

to do Is have the inverse map location State delegation.

We will take the distinct

State and location and convert it back. Now. What we'll do is we will now

Define a function get optimal which is the get optimal route, which will have a start location

and an N location. Don't worry. The code is pick but I'll explain you each

and every bit of the code. Now the get optimal

route function will take two arguments the style location

in the warehouse and the end location

in the warehouse recipe lovely and it will return

the optimal route for reaching the end location from the starting location

in the form of an ordered list containing the letters. So we'll start by defining the function by initializing

the Q values to be all zeros. So as you can see here, we have given the Q value

has to be 0 but For that what we need to do is copy

the reward Matrix to a new one.

So this is the rewards

new and next again. What we need to do is get

the ending State corresponding to the ending location. And with this information

automatically will set the priority of the given ending

stay to the highest one that we are not defining it now, but will automatically

set the priority of the given ending

State as nine nine nine. So what we're going

to do is initialize the Q values to be 0 and

in the queue learning process what you can see See here.

We are taking I in range

1,000 and we're going to pick up a state randomly. So we're going to use

the MP dot random r + NT and for traversing

through the neighbor location in the same maze. We're going to iterate

through the new reward Matrix and get the actions which are greater

than 0 and after that what we're going to do is pick

an action randomly from the list of the playable actions in years to the next state will going to compute

the temporal difference, which is TD, which is the rewards plus gamma

into The queue of next state and will take n p dot ARG Max of Q of next eight minus Q

of the current state.

We going to then update the Q values using

the Bellman equation as you can see here, you have the Bellman equation and we're going

to update the Q values and after that we're going

to initialize the optimal route with a starting location

now here we do not know what the next location yet. So initialize it with the value

of the starting location, which again is the random Shh

now we do not know about the exact number

of iteration needed to reach to the final location. Hence while loop will be

a good choice for the iteration. So when you're going to fetch

the starting State fetch the highest Q value penetrating to the starting State

we go to the index or the next state, but we need

the corresponding letter. So we're going to use that state

to location function. We just mentioned there and after that we're going

to update the starting location for the next iteration.

Finally, we'll return the root. So let's take the starting

location of n line and and location of L1 and see

what part do we actually get? So as you can see here, we get Airline l8l

five L2 and L1. And if you have a look

at the image here, we have if we start from L9 to L1 we got l8l 5 L

2 l 1 L HL 5 L2 L1. That would yield us the maximum. Mm value of the maximum

reward for the robot. So now we have come to the end of this Q learning session

the past year has seen a lot of great examples

for machine learning and many new high-impact application of machine

learning with discovered and brought to light especially

in the healthcare Finance the speech recognition

augmented reality and much more complex 3D

and video applications. The natural language

processing was easily the most talked about domain within the community

with the likes of you. Lmf it and but

being open sourced. So let's have a look at some of the amazing

machine learning projects which are open sourced

the code is available for you.

And those are discussed in

this 2018 to nine in Spectrum. So the first and the foremost

is tensorflow dot DS now machine learning in the browser or fictional thought

a few years back. Back and a stunning reality. Now a lot of us in this field

are welded to our favorite IDE, but tells of not DOT JS has the

potential to change your habits. It's become a very popular

released since its release earlier this year and continues to amaze

with its flexibility. Now as a repository states, there are primarily

three major features of terms of rho dot J's

develop machine learning and deep learning models in your process

itself run pre-existent as a flow models within

the browser retrain our Gene these prediction models as well. And if you are familiar with

Kara's the high-level layers EPA will seem quite familiar, but there are plenty of examples

available on GitHub repository. So do check out those legs

to Quicken your learning curve. And as I mentioned earlier, I'll leave the links

to all of these open source machine learning projects

in the description below.

The next what we

not discuss is detector on it is developed by Facebook

and made a huge Splash when it was earlier launched in. An 80 those developed by

Facebook's AI research team, which is fa ir. And it implements the state of the art object

detection frame was it is written in Python and as help enable

multiple projects including the dance pose. Now, we'll know what exactly is then suppose

after this example and this repository

contains the code of over 70 preacher involves. So it's a very good

open source small guys. So to check it out now

the moment I talked about then suppose.

That's the next one. I'm going to talk about so That's supposed stents human

pose estimation in the wild, but the code to train

and evaluate your own dance pose using the our CNN model

is included here and I've given the link

to the open source code in the description below and there are notebooks

available as well to visualize certain Sports cocoa data set

the next on our list. We have D

painterly harmonization. Now, I want you to take

a moment to just admire the above images. Can you tell which ones

we're done by a human and which one by a machine? I certainly could not now here. The first frame is

the input image the original one and a third frame as you can see here

has been generated by this technique amazing, right? The algorithm has

an external object to your choosing to any image and manages it to make it look

like nothing touched it now, make sure you check out

the code and try to implement it on different sets

of images yourself.

It is really really fun. But talking about images. We have image out painting now what if I give you an image and

ask you to extend Its boundaries by imagining what it would look like when the entire

scene was captured. You would understandably turn

to some image editing software. But here's the awesome news. You can achieve it

in few lines of code, which is the image out painting. Now. This project is Akira's

implementation of Stanford image out failing paper, which is incredibly cool

and Illustrated paper. And this is how most

research paper should be. I've given the links

in the description below to check it out

guys and see how you can. Implement it now. Let's talk about audio

processing which is an another field where machine learning

has started to make its mark. It is not just limited

to generate music. You can do tasks like audio

classification fingerprinting segmentation tagging and much

more and there is a lot that's still yet to be explored and who knows perhaps you

could use this project to Pioneer your way to the top.

Now what if you want

to discover your own planner now that might perhaps

be overstating things a bit, but the astronaut repository

will definitely get you close. The Google brain team discovered two new planets in the summer

2017 by applying the astronaut. It's a deep neural network

meant for working with astronomical data. It goes to show

the far-ranging application of machine learning and was

a truly Monumental development. And now the team behind the technology has

open source the entire code, so go ahead and check

out your own planet and who knows you might even have

a planet on your name now, I could not possibly

let this section.

Pass by without

mentioning the brt. The Google AI is released

has smashed record on his way to winning the hearts

of NLP enthusiasts and experts alike following you. Lmf it and he LMO brt really

blew away the competition with its performance. It obtained a state

of art result on 11 and LP task apart from

the official Google repository. There is a python

implementation of birth, which is worth checking out

whether it makes a new era or not in natural

language processing. The thing we will soon

find out now add on it. I'm sure you guys

might have heard of it. It is a framework

for automatically learning high quality models without

requiring programming expertise since it's a Google invention.

The framework is based

on tensorflow and you can build and simple models

using a Danette and even extend it to use

to train a neural network. Now the GitHub page contains

the code and example the API documentation and other things to get

your hands dirty the trust me Otto ml is the next big thing. NG in our field now if you follow a few researchers

on social media, you must have come

across some of the images. I am showing here in a video form a stick human

running across the terrain or trying to stand

up or some sort, but that my friends

is reinforcement learning and action now, here's a signature example

of it a framework to create a simulated humanoid to imitate

multiple motion skin. So let's have a look

at the top 10 skills. Are required to become a successful machine

learning engineer. So starting with

programming languages python is the lingua Franca

of machine learning. You may have had

exposure to buy them. Even if you weren't previously in programming or in a computer

science research field. However, it is important to have

a solid understanding of glasses and data structures.

Sometimes python won't

be enough often. You'll encounter projects that need to leverage hardware

for Speed improvements. Now, make sure you are familiar

with the basic algorithms as well as the classes. Memory management

and linking now if you want a job

in machine learning, you will probably have

to learn all of these languages at some point C++ can help

in speeding code up. Whereas our works great

in statistics and plots and Hadoop is java-based. So you probably need

to implement mappers and reducers in Java. Now next we have linear algebra. You need to be intimately

familiar with mattresses vectors and matrix multiplication if you have an understanding

of derivatives and integrals, You should be in the clear.

Otherwise even simple concept like gradient descent

will elude you statistic is going to come up a lot at

least make sure you are familiar with the caution distributions

means standard deviation and much more every bit

of statistical understanding Beyond this helps

the theories help in learning about algorithms great

samples are naive buys gaussian mixture models

and hidden Markov models. You need to have a firm

understanding of probability and stats to understand

these these models just go nuts and study measure Theory and next we have advanced

signal processing techniques. Now feature extraction is one

of the most important parts of machine learning

different types of problems need various Solutions. You may be able to utilize

really cool Advanced signal processing algorithms

such as wavelets share. Let's go blades

and bandless you need to learn about the time-frequency

analysis and try to apply it in your problems. Now, this skill will give

you an edge over all the other skills not this kid. Will give you an edge while you're applying for a machine learning engine

the job or others or next we have applied maths a lot of machine

learning techniques out.

There are just fancy types

of functional approximation. Now these often get developed

by theoretical mathematician and then get applied by people who do not understand

the theory at all. Now the result is that many developers

might have a hard time finding the best techniques

for the problem. So even a basic understanding

of numerical analysis will give you a huge Edge having

a firm understanding. Ending of algorithm

Theory and knowing how the algorithm works. You can also discriminate models

such as svm's now you will need to understand subjects such as gradient descent convex

optimization LaGrange quadratic programming partial

differentiation equations and much more now all this math

might seem intimidating at first if you have been away

from it for a while just machine learning is

much more math intensive than something like

front-end developer. Just like any other skill

getting better at math is a man. Our Focus practice

the next skill in our list is the neural

network architectures. We need machine learning

for tasks that are too complex for human to quote

directly that is tasks that are so complex that it is Impractical now

neural networks are a class of models within the general

machine learning literature or neural networks are

a specific set of algorithms that have revolutionized

machine learning.

They're inspired by

biological neural networks, and the current so-called

deep neural networks have proven to work quite well. Well, the neural

networks are themselves General function approximations, which is why they can be applied to almost

any machine learning problem about learning a complex mapping from the input

to the output space. Of course, there are still

good reason for the surge in the popularity

of neural networks, but neural networks have been

by far the most accurate way of approaching many problems like

translation speech recognition and image classification now

coming to our next point which is the natural

language processing now since it combines

computer science and Listed, there are a bunch of libraries

like the NLT K chances. Mm and the techniques

such as sentimental analysis and summarization that are unique to NLP now audio

and video processing has a frequent overlap with

the natural language processing. However, natural language

processing can be applied to non audio data

like text voice and audio analysis involves

extracting useful information from the audio signals

themselves being well-versed in math will get

you far in this one and you should also be familiar.

Her with the concepts such as

the fast Fourier transforms. Now, these were

the technical skills that are required to become a successful

machine learning engineer. So next I'm going to discuss

some of the non-technical skills or the soft skills, which are required to become

a machine-learning engineer. So first of all,

we have the industry knowledge. Now the most successful

machine learning projects out. There are going to be those that address real pain points

whichever industry we are working for you should know how that industry works and Will be beneficial

for the business if a machine learning engineer

does not have business Acumen and the know-how of the elements that make up

a successful business model or any particular algorithm. Then all those technical skills

cannot be Channel productively, you won't be able

to discern the problems and potential challenges

that need solving for the business to sustain and grow you won't

really be able to help your organization explore

new business opportunities.

So this is a must-have

skill now next we have effective communication. You'll need to explain

the machine learning Concepts to the people

with little to no expertise in the field chances

are you'll need to work with a team of Engineers as

well as many other teams. So communication is going

to make all of this much more easier companies searching for a strong machine learning

engineer looking for someone who can clearly and fluently translate

their technical findings to a non technical team

such as marketing or sales department

and next on our list. We have rapid prototyping so

Iterating on ideas as quickly as possible is mandatory

for finding one that works in machine learning

this applies to everything from picking up the right model to working on projects

such as A/B Testing you need to do a group of techniques used to quickly

fabricate a scale model of a physical part or assembly using the three-dimensional

computer aided design, which is the cat so last but not the least we

have the final skill and that is to keep updated.

You must stay up to date

with Any upcoming changes every month new neural

network models come out that are performed

the previous architecture. It also means being aware of the news regarding

the development of the tools the changelog the conferences and much more you need to know about

the theories and algorithms. Now this you can achieve by reading the research papers

blogs the conference's videos. And also you need to focus

on the online community with changes very quickly. So expect and cultivate

this change now, this is not the Here we have

certain skills the bonus skills, which will give you an edge

over other competitors or the other persons who are applying for a machine-learning engineer

position on the bonus point. We have physics. Now, you might be in a situation where you're like to apply

machine learning techniques to A system that will interact with the real

world having some knowledge of physics will take

you far the next we have reinforcement learning.

So this reinforcement learning

has been a driver behind many of the most

exciting developments in the Deep learning and the AI community. T from the alphago zero to

the open a is Dota 2 pot. This will be

a critical to understand if you want to go

into robotics self-driving cars or other AI related areas. And finally we have computer vision out of all

the disciplines out there. There are by far the most resources available

for learning computer vision. This field appears to have

the lowest barriers to entry but of course this likely means you will face

slightly more competition. So having a good knowledge

of computer vision how it rolls will

give you an edge.

Other competitors now. I hope you got acquainted

with all the skills which are required

to become a successful machine learning engineer. As you know, we are living in the worlds

of humans and machines in today's world. These machines are

the robots have to be programmed before they start

following your instructions. But what if the machine started learning on its own from

their experience work like us and feel like us and do things more

accurately than us now? Well his machine learning Angela

comes into picture to make sure everything is working

according to the procedures and the guidelines. So in my opinion machine

learning is one of the most recent and And Technologies, there is you probably use it

at dozens of times every day without even knowing it.

So before we indulge into the different roles

the salary Trends and what should be

there on the resume of a machine learning engineer while applying for a job. Let's understand who exactly a machine learning

engineering is so machine learning Engineers are

sophisticated programmers who develop machines and systems that can learn and apply knowledge without

specific Direction artificial intelligence is the goal

of a machine-learning engineer. They are computer programmers but their focus goes

beyond specifically programming machines to

perform specific tasks. They create programs that will enable

machines to take actions without being specifically

directed to perform those tasks. Now if we have a look

at the job trends of machine learning in general. So as you can see

in Seattle itself, we have 2,000 jobs in New York. We have 1100 San Francisco. We have 1100 in Bengaluru India, we have 1100 and then

we have Sunnyvale, California where we have

If I were a number of jobs, so as you can see the number of

jobs in the market is too much and probably with the emergence

of machine learning and artificial intelligence. This number is just

going to get higher now.

If you have a look at the job

opening salary-wise percentage, so you can see for the $90,000

per annum bracket. We have 32.7 percentage

and that's the maximum. So be assured that if you get a job as

a machine-learning engineer, you'll probably get

around 90 thousand bucks a year. That's safe to say. Now for the hundred and

ten thousand dollars per year. We have 25% $120,000. We have 20 percent almost then we have a hundred

and thirty thousand dollars which are the senior

machine learning and Jenna's that's a 13 point

6 7% And finally, we have the most senior

machine learning engineer or we have

the data scientist here, which have the salary of a hundred and forty thousand

dollars per annum and the percentage

for that one is really low. So as you can see there is

a great opportunity for people.

What trying to go

into machine learning field and get started with it? So let's have a look

at the machine learning in junior salary. So the average salary

in the u.s. Is around a hundred eleven

thousand four hundred and ninety dollars and the average salary

in India is around seven last nineteen thousand

six hundred forty six rupees. That's a very

good average salary for any particular profession. So moving forward

if we have a look at the salary of

an entry-level machine learning. You know, so the salary

ranges from $76,000 or seventy seven thousand

dollars two hundred and fifty one thousand

dollars per annum. That's a huge salary. And if you talk

about the bonus here, we have like three thousand dollars to twenty

five thousand dollars depending on the work YouTube and

the project you are working on.

Let's talk about

the profit sharing now. So it's around

two thousand dollars to fifty thousand dollars. Now this again depends upon the project you are working

the company you are working for and the percentage that Give to the in general

or the developer for that particular project. Now, the total pay comes around

seventy six thousand dollars or seventy-five thousand dollars two hundred and sixty

two thousand dollars and this is just for the entry

level machine learning engineer. Just imagine if you become

an experience machine learning engineer your salary

is going to go through the roof. So now that we have understood who exactly is

a machine learning engineer the various salary Trends

the job Trends in the market and how it's rising. Let's understand. What skills it takes to become

a machine learning engine. So first of all, we have programming languages

now programming languages are big deal when it comes

to machine learning because you don't just

need to have Proficiency in one language you might

require Proficiency in Python. Java are or C++ because you might be working

in a Hadoop environment where you require Java programming to do

the mapreduce Coatings and sometimes our is very great for visualization purposes

and python has you know, Another favorite languages when comes to machine

learning now next scale that particular individual needs

is calculus and statistics.

So a lot of machine learning

algorithms are mostly maths and statistics. So and a lot of static is required majorly

the matrix multiplication and all so good understanding of calculus as well as

statistic is required. Now next we have signal processing now Advanced

signal processing is something that will give you an upper Edge over other machine

learning engine is if you are Applying

for a job anywhere. Now the next kill we

have is applied maths as I mentioned earlier

many of the machine learning algorithms here are

purely mathematical formulas. So a good understanding of maths

and how the algorithm Works will take you far ahead

the next on our list.

We have neural networks. No real networks are something that has been emerging quite popularly in the recent

years and due to its efficiency and the extent to which it

can walk and get the results as soon as possible. Neural networks are a must for machine learning engine

now moving forward. We have language processing. So a lot of times machine

learning Engineers have to deal with text Data the voice data

as well as video data now processing any kind

of language audio or the video is something that a machine-learning engineer

has to do on a daily basis. So one needs to be proficient

in this area also now, these are only some

of the few skills which are absolutely necessary. I would say for

any machine learning and Engineer so let's

now discuss the job description or the roles and responsibilities of a particular machine

learning engineer now depending on their level of expertise machine

learning Engineers may have to study and transform

data science prototypes.

They need to design

machine Learning Systems. They also need to research and

Implement appropriate machine learning algorithms and tools as it's a very

important part of the job. They need to develop

new machine learning application according to the industry

requirements the Select the appropriate data sets and the data

representation methods because if there is a slight

deviation in the data set and the data representation that's going to

affect Model A lot. They need to run machine

learning tests and experiments. They need to perform

statistical analysis and fine-tuning using

the test results. So sometimes people ask what exactly is a difference

between a data analyst and a machine learning engineer. So so static analysis just a small part of of

machine learning Engineers job.

Whereas it is a major part or it probably covers a large

part of a data analyst job rather than a machine

learning Engineers job. So machine learning

Engineers might need to train and retrain the systems

whenever necessary and they also need to extend the existing

machine learning libraries and Frameworks to

their full potential so that they could make

the model Works superbly and finally they need to keep

abreast of the developments in the field needless to say that any machine. In general or any particular

individual has to stay updated to the technologies that are coming in the market and every now and then

a new technology arises which will overthrow

the older one. So you need to be

up to date now coming to the resume part

of a machine learning engineer.

So any resume of a particular

machine learning Engineers should consist like clear

career objective skills, which a particular

individual possesses the educational qualification certain certification

the past experience if you are an experienced

machine learning and Jen are the projects which you

have worked on and that's it. So let's have a look

at the various elements that are required in a machine-learning

Engineers resume. So first of all, you need to have

a clear career objective. So here you will need

not stretch it too much and keep it as

precise as possible. So next we have the skills

required and these skills can be technical as

well as non technical. So let's have a look at the various Technical and

non-technical skills out here. So starting with

the technical skills. First of all, we have programming languages

as an our Java Python and C++. But the first and the foremost requirement

is to have a good grip on any programming languages

preferably python as it is easy to learn and it's applications are wider

than any other language now, it is important to have

a good understanding of topics like data structures memory

management and classes.

All the python is

a very good language it alone cannot help you so you will probably

have to learn all these he's languages

like C++ are python Java and also work on mapreduce at some point of time

the next on our list. We have calculus and linear

algebra and statistics. So you'll need to be

intimately familiar with matrices the vectors

and the matrix multiplication. So statistics is going

to come up a lot and at least make sure

you are familiar with caution distribution

means standard deviations and much more. So you also need to have a firm

understanding of probability. Stats to understand the machine

learning models the next as I mentioned earlier, it's

signal processing techniques. So feature extraction is one

of the most important parts of machine learning different types of problems

need various Solutions.

So you may be able to utilize the really cool Advanced signal

processing algorithms such as wavelengths shallots curve. Let's and the ballast

so try to learn about the time-frequency analysis and

try to apply it to your problems as it gives you an upper jaw. Our other machine

learning Engineers, so just go for the next we have mathematics and a lot of

machine learning techniques out. There are just fancy types

of function approximation having a firm understanding

of algorithm Theory and knowing how the algorithm works

is really necessary and understanding subjects like gradient descent

convex optimization quadratic programming and partial differentiation will

help a lot the neural networks as I was talking earlier.

So we need machine learning

for tasks that are too Flex for humans to quote directly. So that is the tasks

that are so complex that it is Impractical neural

networks are a class of models within the general

machine learning literature. They are specific

set of algorithms that have revolutionized machine learning deep neural networks

have proven to work quite well and neural networks are themself General

function approximations, which is why they can be applied to almost any machine

learning problem out there and they help a lot

about learning a complex mapping from the input to The output space now next

we have language processing since natural language

processing combines two of the major areas of work that are linguistic

and computer science and chances are at some point

you are going to work with either text

or audio or the video. So it's necessary to have

a control over libraries like gents mm and ltk and techniques like

what to wet sentimental analysis and text summarization Now voice and audio analysis involves

extracting useful information from the Your signals themselves

very well versed in maths and concept like Fourier

transformation will get you far in this one.

These were the technical skills

that are required but be assured that there are a lot

of non technical skills. Also that are required

to land a good job in a machine learning industry. So first of all, you need to have

an industry knowledge. So the most successful

machine learning projects out. There are going to be those

that address real pain points, don't you agree? So whichever industry

are working for You should know how that industry works and what will be beneficial

for the industry.

Now, if a machine

learning engineer does not have business Acumen

and the know-how of the elements that make up

a successful business model. All those technical

skills cannot be channeled productively. You won't be able

to discern the problems and the potential challenges

that need solving for the business to sustain

and grow the next on our list. We have effective communication

and not this is one of the most important parts

in any job requirements. So you'll need to In machine

learning Concepts to people with little to no expertise

in the field a chances are you will need to work

with a team of Engineers as well as many other

teams like marketing and the sales team. So communication is going

to make all of this much easier companies searching for the strong machine learning

engineer looking for someone who can clearly and fluency

translate technical findings to a non technical team.

Rapid prototyping

is another skill, which is very much required for

any machine learning engineer. So iterating on ideas as

quickly as possible is mandatory for finding the one that works in machine learning

this applies to everything from picking the right model to working on projects

such as a/b testing and much more now you

need to do a group of techniques used to quickly

fabricate a scale model of a physical part or assembly using the three-dimensional

computer aided design, which is the cat data now coming

to the final skills, which will be required for any machine learning agenda

is to keep updated. So you must stay up to date

with any upcoming changes every month new neural

network models come out that outperformed

the previous architecture. It also means being aware of the

news regarding the development of the tools Theory and algorithms through research

papers blocks conference videos and much more. Now another part of any machine

learning engineer's resume is the education qualification.

So a bachelor's or master's degree in computer

science RIT economics statistics or even mathematics can help. Up you land a job

in machine learning plus if you are an experienced

machine learning engineer, so probably some standard

company certifications will help you a lot when Landing a good job

in machine learning and finally coming

to the professional experience. You need to have experience in

computer science statistics data as is if you are switching from any other profession into

a machine learning engineer, or if you have a previous

experience in machine learning that is very well. Now finally if we talk

about The projects so you need to have

not just any project that you have worked on you

need to have working on machine learning related projects that involve a

certain level of AI and working on neural networks

to a certain degree to land a good job as

a machine-learning engineer.

Now if you have a look

at the company's hiring machine learning Engineers, so every other

company is looking for machine learning Engineers who can modify the existing

model to something that did not need much more. Of Maintenance and cancel

sustain so basically working on artificial intelligence

and new algorithms that can work on their own is

what every company deserves. So Amazon Facebook. We have Tech giants like Microsoft IBM again

in the gaming industry, we have or the GPU

industry Graphics industry. We have Nvidia

in banking industry. We have JPMorgan Chase again, we have LinkedIn

and also we have Walmart. So all of these companies

require machine learning engine at some part of the time.

So be assured that if you are looking for a machine

learning engineer post, every other companies be it

a big shot company or even the new startups are looking

for machine learning Engineers. So be assured you will get

a job now with this we come to an end of this video. So I hope you've got

a good understanding of who exactly are

machine learning engineer is the way just job Trends

the salary Trends.

What are the skills required to

become machine learning engineer and once you become

a machine-learning engineer, what are the roles

and responsibilities or the Job description what appears to be on the resume

or the job description what appears to be on the job application of

any machine learning engineers? And also I hope you got to know

how to prepare your resume or how to prepare it

in the correct format. And what on to keep their in the resume the career

objectives the skills Technical and non-technical previous experience

education qualification and certain projects which are related to it. So that's it guys Ed Rica as you know provides

a machine learning.

Engineer master's program now

that is aligned in such a way that will get you acquainted

in all the skills that are required

to become a machine learning engine and that too

in the correct form..