|
Latent
Topic Hypertext Model A probabilistic generative model
for hypertext document collections that explicitly models the generation of links.
Specifically, links from a word w to a document d depend directly on how frequent the topic of w is in d, in addition to the in-degree
of d. We show how to perform EM learning on this model
efficiently. By not modeling links as analogous to words, we end up using far
less free parameters, and obtain better link prediction results. Below you can find topics
learned with this model and compared with topic learned with the LDA model,
as well as the exact datasets we used. |
|
|
|
Code:
LTHM source code
The current implementation is the research code, built on top of the HTMM
implementation with epsilon set to 1. It will soon be replaced with a cleaner
and more efficient implementation.
Data:
WebKB
original – 8282 html files from CMU