Suppose we have the following dataset , where the target variable is whether a movie will be hit or not and the feature variables are the action rating and story rating (a whole numbers between 1 to 10)
BUT NO SUCH POINT IS PRESENT IN THE DATA SET , SO SHOULD WE SET THIS PROBABILITY TO ZERO? AND SIMILAR WITH THE SECOND EXPRESSION? THIS WOULD MEAN THAT ANY UNSEEN POINT WOULD ALWAYS LEAD TO BOTH PROBABILITIES TURNING TO ZERO. SO HOW DO WE RESOLVE THIS ISSUE ? LETS GET THERE.
For calculating the 2 left conditional probabilities we assume that the values in the data set are sampled from a gaussian distribution with mean and variance calculated from the sample points . To recall , this is what a Gaussian distribution looks like:
For applying Naive bayes we assumed that in any feature, points will come from a GAUSSIAN DISTRIBUTION . But what if it is not the case . Following are a few explanations and points that you need to follow :