index
module components.evita.features
AChunkFeatures
ChunkFeatures
NChunkFeatures
VChunkFeatures
VChunkFeaturesList
This module contains classes that add grammatical features to NounChunks,
VerbChunks and AdjectiveTokens. The grammatical features drive part of the event
recognition.
class AChunkFeatures
Inherits from: ChunkFeatures

Contains the grammatical features for an AdjectiveToken. There is a
little naming disconnect here since we call these chunk features.

Public Functions

__init__(self, adjectivetoken, verbfeatures=None)
Initialize with an AdjectiveToken and use default values for most instance variables, but percolate grammatical features from the copular verb if they were handed in.
getEventClass(self)
Return I_STATE if the head is on a short list of intentional state adjectives, return STATE otherwise.
class ChunkFeatures
Inherits from: object

The subclasses of this class are used to add grammatical features to a
NounChunk, VerbChunk or AdjectiveToken. It lives in the features variable
of instances of those classes.

Public Functions

__init__(self, category, chunk_or_token, verbfeatures=None)
Common initialization for AChunkFeatures, NChunkFeatures and VChunkFeatures.
__str__(self)
add_verb_features(self, verbfeatures)
Set some features (tense, aspect, modality and polarity) to the values of those features on the governing verb.
as_verbose_string(self)
Debugging method to print the ChunkFeatures and its features.
print_vars(self)
Debugging method to print all variables.
class NChunkFeatures
Inherits from: ChunkFeatures

Contains the grammatical features for a NounChunk.

Public Functions

__init__(self, nounchunk, verbfeatures=None)
Initialize with a NounChunk and use default values for most instance variables.
getEventClass(self)
Get the event class for the ChunkFeatures. For nominals, the event class is always OCCURRENCE.
getEventLemma(self)
Return the lemma from the head of the chunk. If there is no head or the head has no lemma, then build it from the text using a stemmer.
class VChunkFeatures
Inherits from: ChunkFeatures

Contains the grammatical features for a VerbChunk. Applies some feature
rules from the evita library in the course of setting tense and aspect
features. Also has some methods that test whether the features indicate
whether the node is of a particular kind (for example, nodeIsbecome).

Public Functions

__init__(self, verbchunk, tCh, negMk, infMk, advPre, advPost)
Initialize with a verb chunk and the lists handed in from the VChunkFeaturesList object.
__str__(self)
apply_feature_rules(self)
Returns a triple of TENSE, ASPECT and CATEGORY given the tokens of the chunk, which are stored in self.trueChunk. Selects the rules relevant for the length of the chunk and applies them. Returns None if no rule applies.
as_short_string(self)
as_verbose_string(self)
getEventClass(self)
Return the event class for the nominal, using the regelar expressions in the library.
getHead(self)
Return the head, which is the last element of the core in self.trueChunk, return None if there is no such core.
getModality(self)
getPolarity(self)
getPreHead(self)
Return the element before the head, which is the last element of the core in self.trueChunk, return None if there is no such element.
isAuxVerb(self)
Return True if the head is an auxiliary verb.
is_be(self)
is_become(self)
is_continue(self)
is_do_auxiliar(self)
is_future_going_to(self)
is_have(self)
is_keep(self)
is_modal(self)
is_past_used_to(self)
is_wellformed(self)
Return True if the verb features well-formed, that is, there is content in the trueChunks core feature and there is a head.
normalizeHave(self, form)
normalizeMod(self, form)
pp(self)
set_tense_and_aspect(self)
Sets the tense and aspect attributes by overwriting the default values with results from the feature rules in FEATURE_RULES. If no feature rules applied, create a throw-away features list for the head and use the features from there (which might still be defaults).
class VChunkFeaturesList
Inherits from: object

This class is used to create a list of VChunkFeatures instances. What
it does is (1) collecting information from a VerbChunk or a list of Tokens,
(2) move this information into separate bins depending on the type of items
in the source, (3) decide whether we need more than one instance for some
input, and (4) create a list of VChunkFeatures.

On initialization, an instance of NChunkFeatures is given a NounChunk, but a
VChunkFeaturesList is given a VerbChunk or a list of Tokens (or maybe
other categories as well). VerbChunks are different from NounChunks in that
there can be more than one VChunkFeatures instance for a single
VerbChunk. This is not very common, but it happens for example in

   "More problems in Hong Kong for a place, for an economy, that many
    experts [thought was] once invincible."

where "thought was" ends up as one verb chunk, but we get two features sets.

Another difference is that sometimes a VChunkFeatures instance is created
for a sequence that includes tokens to the right of the VerbChunk, for
example in

   "All Arabs [would have] [to move] behind Iraq."

where there are two adjacent VerbChunks. With the current implementation,
when processing [would have], we end up creating VChunkFeatures instances
for "would have" and "would have to move", and then, when dealing with "to
move", we create a VChunkFeatures instance for "to move".

TODO: check whether "would have" and "to move" should be ruled out
TODO: check why "to move" is not already ruled out through the flag

Note that in both cases, the root of the issue is that the chunking is not
appropriate for Evita.

TODO: consider updating the Chunker and simplifying the code here.

Public Functions

__getitem__(self, index)
__init__(self, verbchunk=None, tokens=None)
Initialize several kinds of lists, distributing information from the VerbChunk or list of Tokens that is handed in on initialization and create a list of VChunkFeatures instances in self.featuresList.
__len__(self)
__str__(self)
print_ChunkLists(self)

Private Functions

_addInCurrentSublist(self, sublist, element)
Add the element to the current element (that is, the last element) in sublist. The elements of the sublist are lists themselves.
_addInPreviousSublist(self, sublist, element)
Add the element to the previous element (that is, the penultimate element) in sublist. The elements of the sublist are lists themselves.
_distributeNode_ADV(self, item, tempNodes, itemCounter)
Just add the adverb to an adverb list, the trick is to figure out which list to add it. Factors are the location of the item in the tempNodes list and the pos tags of the elements following the item.
_distributeNode_MD(self, item)
Add the modal element to the core list.
_distributeNode_NEG(self, item)
Do not add the negation item to the core in self.trueChunkLists, but add it to the list with negation markers.
_distributeNode_TO(self, item, itemCounter)
If the item is the first one, just add the item to the infinitive markers list. Otherwise, see if the last element in the core is one of a small group ('going', 'used' and forms of 'have'), if it is, add the element to the core, if not, do nothing at all.
_distributeNode_V(self, item, tempNodes, itemCounter)
Add a verb to the lists. This takes one of two actions, depending on the kind of verb we are dealing with and on whether it is followed by TO.
_distributeNodes(self)
Distribute the item's information over the lists in the VChunkFeaturesLists.
_generate_features_list(self)
_initialize_lists(self)
Initializes the lists that contain items (Tokens) of the chunk. Since one chunk may spawn more than one VChunkFeatures instance, these lists are actually lists of lists.
_initialize_nodes(self)
Given the VerbChunk or a list of Tokens, set the nodes variable to either the daughters of the VerbChunk or the list of Tokens. Also sets node and tokens, where the first one has the VerbChunk or None (this is so we can hand the chunk to VChunkFeatures instance, following ChunkFeatures behaviour), and where the second one is the list of Tokens or None.
_item_is_followed_by_TO(self, tempNodes, itemCounter)
Return True if one of the next two tokens is TO, return False otherwise.
_treatMainVerb(self, item, tempNodes, itemCounter)
Add a main verb to the trueChunks list. That is all that is done when the item is followed by adverbs only. In other cases, we have a chunk which has two subchunks and _updateChunkLists is called to introduce the second chunk. This is to deal with cases like 'might consider filing', where we want to end up with two events.
_updateChunkLists(self)
Append an empty list to the end of all lists maintained in the VChunkFeaturesList and update the counter.
module functions
debug(text, newline=True)
getPOSList(constituents)
Returns a list of parts-of-speech from the list of constituents, typically the constituents are instances of NounChunk, VerbChunk or Token. Used for debugging purposes.
getWordList(constituents)
Returns a list of words from the list of constituents, typically the constituents are instances of NounChunk, VerbChunk or Token. Used for debugging purposes.
getWordPosList(constituents)
Returns a list of word/POS for all constituents.