A Barzilay & Lee-style content model

So these days I've been reading up on content models, and have implemented a very simple version of gibbs sampling for them. I think the code mostly speaks for itself, but I like to point out that the bottleneck is going through all words in each sentence for each topic to compute the probabilities, which is slightly painful, but at least with scipy's weave it is possible to make it not so slow. There are many ways of using this code (it also requires the hyperparameter sampling routines from the last post, and obvious implementations of probability density functions).

 

Enjoy (if you're interested in this sort of thing)