LDA is a widely known model that finds topics in text. But, similar to k-means, you have to decide how many topics to look for. The stochastic nature of machine learning algorithms also causes the model to have instabilities, which results in topics being a mixture of multiple topics. This is the issue that led to the development of the “Ensemble LDA” model, which detects the stable topics in the text by using an ensemble of LDA topic models. The model detects and uses topic mixtures for stabilization.

Python Package

It became part of gensim: https://github.com/RaRe-Technologies/gensim/pull/2980.

Thesis Download

EnsembleLDA.pdf

You can also find it on https://www.researchgate.net/publication/369543116

Special Thanks

Thanks to aloosley for his support during the thesis and contributions to the PR.

Also thanks to mpenkov and radim for their code reviews and continuous interest in the work.