Expecation Maximization is a powerful approach to estimate parameters of statistical distributions. It consists of two steps, i) Expectation step, where the likelihood of the dataset is estimated using the current set of parameters. ii) Maximization step, where the parameters are recalculated based on the likelihood found in the expectation step.
Let's see an example: Suppose we have a dataset which has only one feature, namely, age. Suppose the data was taken from a country where most of the people are young and from another country where most of them are old. We are interested in finding out the mean and variance of these two distributions, assuming that they were normally distributed.
Say our dataset looks like this:
$[25, 27, 27, 28, 28, 30, 100, 105, 105, 107, 107, 108]$ . We shall implement the EM algorithm to find out the means and variances.
clc;clear all; % sample dataset d = [25 27 27 28 28 30 100 105 105 107 107 108]; %initial guesses of the two means and variances mu1 = 10; mu2 = 50; sigma1 = 10; sigma2 = 10; % initial responsibilities p1 = .5; p2 = .5; %until converges for k = 1:100 % start expectation step v1 = p1 * normpdf(d, mu1, sqrt(sigma1)); v2 = p2 * normpdf(d, mu2, sqrt(sigma2)); r1 = v1 ./ (v1 + v2); %divide ith element of v1 by the sum of ith element of both v1 and v2 r2 = v2 ./ (v1 + v2) ;%divide ith element of v2 by the sum of ith element of both v1 and v2 % end expectation step % start maximization step mu1 = sum(r1 .* d) / sum(r1); %recalculate mu1, weighted average of the datapoints mu2 = sum(r2 .* d) / sum(r2); %recalculate mu2, weighted average of the datapoints sigma1 = sum((d-mu1) .* (d-mu1) .* r1) / sum(r1); %recalculate weighted variance first distribution sigma2 = sum((d-mu2) .* (d-mu2) .* r2) / sum(r2); %recalculate weighted variance second distribution p1 = sum(r1) / length(d); % recalculate responsibilities, total weights normalized by number of data points p2 = sum(r2) / length(d); % end maximization step end mu1 % 27.5000, mean of the first distribution mu2 % 105.3333, mean of the second distribution sigma1 % 2.2500, variance of the first distribution sigma2 % 6.8889, variance of the second distribution
No comments:
Post a Comment