In this case, sample.

In this previous post I described the debate over whether to sample or collapse missing observation variables. I have now run the experiments, the results of which I will present here as promised. These experiments are based on my infinite sparse Factor Analysis model, which you can read about here, run on the gene expression dataset you can find as part of Mike West’s BFRM package here.

mv1

Figure 1. Twenty runs of a 1000 MCMC iterations of my infinite sparse Factor Analysis model with different approaches to handling the missing values.

Figure 1 shows that the uncollapsed sampler, with no added noise (red) performs best: it achieves the lowest predictive error in the shortest time. Adding noise to the missing values (like you should for a genuine Gibbs sampling scheme) for this version (green) both decreases the performance in absolute terms, and has a surprisingly detrimental effect on the run time as well (this could just be a result of the time it takes to sample noise at each iteration). The collapsed sampler performs better in absolute terms than the collapsed sampler with noise and has better run time.

Figure 2. Boxplot of prediction error after 1000 MCMC steps for each missing value approach.

Figure 2. Boxplot of prediction error after 1000 MCMC steps for each missing value approach.

Figure 2 confirms this conclusion: sampling the missing values and not adding any noise gives the best performance.

On a related note, I’ve been looking at the effect of removing the assumption of isotropic noise. This seems to be quite a reasonable thing to do, and doesn’t make the calculations much more involved at all.

Figure 3. 20 MCMC runs of 1000 iterations with isotropic vs. non-isotropic noise model.

Figure 3. 20 MCMC runs of 1000 iterations with isotropic vs. non-isotropic noise model.

Figure 4. Boxplot of predictive error with diagonal or isotropic noise model.

Figure 4. Boxplot of predictive error with diagonal or isotropic noise model.

Figures 3-4 confirm the intuitive that including a full diagonal noise model does improve the predictive performance of the model.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: