jml.topics
Class TopicModel

java.lang.Object
  extended by jml.topics.TopicModel
Direct Known Subclasses:
LDA, LSI, SparseLSA

public abstract class TopicModel
extends java.lang.Object

Abstract super class for all topic models.

Version:
1.0, Jan 3rd, 2013
Author:
Mingjie Qian

Field Summary
protected  org.apache.commons.math.linear.RealMatrix dataMatrix
          A V-by-N matrix with each column being a term-count array.
protected  org.apache.commons.math.linear.RealMatrix indicatorMatrix
          A N-by-K matrix holding document--topic associations.
 int nTopic
          Number of topics.
protected  org.apache.commons.math.linear.RealMatrix topicMatrix
          A V-by-K matrix, where each column is a topic vector represented by a vector of weights.
 
Constructor Summary
TopicModel()
          Default constructor for this topic model.
TopicModel(int nTopic)
          Constructor for this topic model given the number of topics for a corpus.
 
Method Summary
 org.apache.commons.math.linear.RealMatrix getIndicatorMatrix()
          Get the N-by-K topic assignment matrix for this topic model.
 org.apache.commons.math.linear.RealMatrix getTopicMatrix()
          Get the V-by-K topic matrix for this topic model.
 void readCorpus(org.apache.commons.math.linear.RealMatrix dataMatrix)
          Read corpus from a V-by-N document-term-count matrix for this topic model.
abstract  void train()
          Train this topic model to fit the given corpus.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dataMatrix

protected org.apache.commons.math.linear.RealMatrix dataMatrix
A V-by-N matrix with each column being a term-count array.


topicMatrix

protected org.apache.commons.math.linear.RealMatrix topicMatrix
A V-by-K matrix, where each column is a topic vector represented by a vector of weights.


indicatorMatrix

protected org.apache.commons.math.linear.RealMatrix indicatorMatrix
A N-by-K matrix holding document--topic associations.


nTopic

public int nTopic
Number of topics.

Constructor Detail

TopicModel

public TopicModel()
Default constructor for this topic model.


TopicModel

public TopicModel(int nTopic)
Constructor for this topic model given the number of topics for a corpus.

Parameters:
nTopic - number of topics to fit a corpus
Method Detail

readCorpus

public void readCorpus(org.apache.commons.math.linear.RealMatrix dataMatrix)
Read corpus from a V-by-N document-term-count matrix for this topic model.

Parameters:
dataMatrix - a V-by-N document-term-count matrix

getTopicMatrix

public org.apache.commons.math.linear.RealMatrix getTopicMatrix()
Get the V-by-K topic matrix for this topic model.

Returns:
a V-by-K topic matrix

getIndicatorMatrix

public org.apache.commons.math.linear.RealMatrix getIndicatorMatrix()
Get the N-by-K topic assignment matrix for this topic model.

Returns:
an N-by-K topic indicator matrix

train

public abstract void train()
Train this topic model to fit the given corpus.