index
module utilities.mallet
MalletClassifier
Interface module to Mallet.

Defines a couple of methods that create Mallet commands and a class that
provides an alternative interface.

cvs2vectors_command() creates a command that lets you take a file with lines
like below and create a binary vector file.

    ABC19980108.1830.0711.tml-ei377-ei378 BEFORE e1-asp=NONE e1-cls=OCCURRENCE
    e1-epos=None e1-mod=NONE e1-pol=POS e1-stem=None e1-str=assistance
    e1-tag=EVENT e1-ten=NONE e2-asp=PROGRESSIVE e2-cls=OCCURRENCE e2-epos=None
    e2-mod=NONE e2-pol=NEG e2-stem=None e2-str=helping e2-tag=EVENT
    e2-ten=PRESENT shAsp=1 shTen=1

train_model_command() creates a command that lets you take the binary vector and
create a classifier model.

See the docstring at the bottom of this module for some bitchy notes on
classify_command().
class MalletClassifier
Inherits from: object

Currently we take the command and then run a simple os.system(). It
doesn't really matter for model building, but for classification we should
at some point use subprocess to do this so we can write to and read from an
open pipe. This class has the code to do that, but it is not used yet.

Public Functions

__init__(self, mallet, name='--name 1', data='--data 3', regexp="--line-regex \"^(\S*)[\s,]*(\S*)[\s]*(.*)$\"")
Initialize a classifier by setting its options. All options are optional except for mallet which is the directory where Mallet lives.
__str__(self)
add_classifiers(self, *classifier_paths)
classify_file(self)
Classify a file and write results to a file. This assumes that input vectors are given as a filename and that output goes to a file (that is, both vectors and output are not None). This would run a command where the command would be very similar to what classify_command() does.
classify_vectors(self, classifier, vector_list)
Given a list of vectors in string format, run them all through the classifier and return the results.
pp(self)
Pretty priniter for the MalletClassifier.

Private Functions

_make_pipe(self, classifier)
Open a pipe into the classifier command.
_pipe_command(self, classifier)
Assemble the classifier command for use in a pipe.
_shell_command(self, classifier, vectors)
Assemble the classifier command for command line use with input and output files.
module functions
classify_command(mallet, vectors, model)
The command for running the classifier over a vector file.
cvs2vectors_command(mallet, vectors, output=False)
The command for creating a binary vector file from a text vector file. Each line in vectors looks like "ID LABEL FEAT1 FEAT2...".
parse_classifier_line(line)
Return a pair of the identifier (instance name in Mallet talk) and a sorted list of labels and their scores in <score, label> format, with the highest score first.
train_model_command(mallet, vectors, trainer='MaxEnt', cross_validation=False)
The command for creating a model from a binary vector. There is no cross validation by default, use cross_validation=5 to turn it on using 5-fold cross validation.