index
module components.evita.features
AChunkFeatures
ChunkFeatures
NChunkFeatures
VChunkFeatures
VChunkFeaturesList
This module contains classes that add grammatical features to NounChunks,
VerbChunks and AdjectiveTokens. The grammatical features drive part of the event
recognition.
class AChunkFeatures
Inherits from: ChunkFeatures
Contains the grammatical features for an AdjectiveToken. There is a
little naming disconnect here since we call these chunk features.
Public Functions
__init__(self, adjectivetoken, verbfeatures=None)
Initialize with an AdjectiveToken and use default values for most instance
variables, but percolate grammatical features from the copular verb if
they were handed in.
getEventClass(self)
Return I_STATE if the head is on a short list of intentional state
adjectives, return STATE otherwise.
class ChunkFeatures
Inherits from: object
The subclasses of this class are used to add grammatical features to a
NounChunk, VerbChunk or AdjectiveToken. It lives in the features variable
of instances of those classes.
Public Functions
__init__(self, category, chunk_or_token, verbfeatures=None)
Common initialization for AChunkFeatures, NChunkFeatures and
VChunkFeatures.
__str__(self)
add_verb_features(self, verbfeatures)
Set some features (tense, aspect, modality and polarity) to the values of
those features on the governing verb.
as_verbose_string(self)
Debugging method to print the ChunkFeatures and its features.
print_vars(self)
Debugging method to print all variables.
class NChunkFeatures
Inherits from: ChunkFeatures
Contains the grammatical features for a NounChunk.
Public Functions
__init__(self, nounchunk, verbfeatures=None)
Initialize with a NounChunk and use default values for most instance
variables.
getEventClass(self)
Get the event class for the ChunkFeatures. For nominals, the event
class is always OCCURRENCE.
getEventLemma(self)
Return the lemma from the head of the chunk. If there is no head or
the head has no lemma, then build it from the text using a stemmer.
class VChunkFeatures
Inherits from: ChunkFeatures
Contains the grammatical features for a VerbChunk. Applies some feature
rules from the evita library in the course of setting tense and aspect
features. Also has some methods that test whether the features indicate
whether the node is of a particular kind (for example, nodeIsbecome).
Public Functions
__init__(self, verbchunk, tCh, negMk, infMk, advPre, advPost)
Initialize with a verb chunk and the lists handed in from the
VChunkFeaturesList object.
__str__(self)
apply_feature_rules(self)
Returns a triple of TENSE, ASPECT and CATEGORY given the tokens of
the chunk, which are stored in self.trueChunk. Selects the rules
relevant for the length of the chunk and applies them. Returns None if
no rule applies.
as_short_string(self)
as_verbose_string(self)
getEventClass(self)
Return the event class for the nominal, using the regelar expressions
in the library.
getHead(self)
Return the head, which is the last element of the core in
self.trueChunk, return None if there is no such core.
getModality(self)
getPolarity(self)
getPreHead(self)
Return the element before the head, which is the last element of the
core in self.trueChunk, return None if there is no such element.
isAuxVerb(self)
Return True if the head is an auxiliary verb.
is_be(self)
is_become(self)
is_continue(self)
is_do_auxiliar(self)
is_future_going_to(self)
is_have(self)
is_keep(self)
is_modal(self)
is_past_used_to(self)
is_wellformed(self)
Return True if the verb features well-formed, that is, there is
content in the trueChunks core feature and there is a head.
normalizeHave(self, form)
normalizeMod(self, form)
pp(self)
set_tense_and_aspect(self)
Sets the tense and aspect attributes by overwriting the default
values with results from the feature rules in FEATURE_RULES. If no
feature rules applied, create a throw-away features list for the head
and use the features from there (which might still be defaults).
class VChunkFeaturesList
Inherits from: object
This class is used to create a list of VChunkFeatures instances. What
it does is (1) collecting information from a VerbChunk or a list of Tokens,
(2) move this information into separate bins depending on the type of items
in the source, (3) decide whether we need more than one instance for some
input, and (4) create a list of VChunkFeatures.
On initialization, an instance of NChunkFeatures is given a NounChunk, but a
VChunkFeaturesList is given a VerbChunk or a list of Tokens (or maybe
other categories as well). VerbChunks are different from NounChunks in that
there can be more than one VChunkFeatures instance for a single
VerbChunk. This is not very common, but it happens for example in
"More problems in Hong Kong for a place, for an economy, that many
experts [thought was] once invincible."
where "thought was" ends up as one verb chunk, but we get two features sets.
Another difference is that sometimes a VChunkFeatures instance is created
for a sequence that includes tokens to the right of the VerbChunk, for
example in
"All Arabs [would have] [to move] behind Iraq."
where there are two adjacent VerbChunks. With the current implementation,
when processing [would have], we end up creating VChunkFeatures instances
for "would have" and "would have to move", and then, when dealing with "to
move", we create a VChunkFeatures instance for "to move".
TODO: check whether "would have" and "to move" should be ruled out
TODO: check why "to move" is not already ruled out through the flag
Note that in both cases, the root of the issue is that the chunking is not
appropriate for Evita.
TODO: consider updating the Chunker and simplifying the code here.
Public Functions
__getitem__(self, index)
__init__(self, verbchunk=None, tokens=None)
Initialize several kinds of lists, distributing information from the
VerbChunk or list of Tokens that is handed in on initialization and
create a list of VChunkFeatures instances in self.featuresList.
__len__(self)
__str__(self)
print_ChunkLists(self)
Private Functions
_addInCurrentSublist(self, sublist, element)
Add the element to the current element (that is, the last element) in
sublist. The elements of the sublist are lists themselves.
_addInPreviousSublist(self, sublist, element)
Add the element to the previous element (that is, the penultimate
element) in sublist. The elements of the sublist are lists themselves.
_distributeNode_ADV(self, item, tempNodes, itemCounter)
Just add the adverb to an adverb list, the trick is to figure out which list
to add it. Factors are the location of the item in the tempNodes list
and the pos tags of the elements following the item.
_distributeNode_MD(self, item)
Add the modal element to the core list.
_distributeNode_NEG(self, item)
Do not add the negation item to the core in self.trueChunkLists, but add it
to the list with negation markers.
_distributeNode_TO(self, item, itemCounter)
If the item is the first one, just add the item to the infinitive markers
list. Otherwise, see if the last element in the core is one of a small
group ('going', 'used' and forms of 'have'), if it is, add the element to the
core, if not, do nothing at all.
_distributeNode_V(self, item, tempNodes, itemCounter)
Add a verb to the lists. This takes one of two actions, depending on the kind
of verb we are dealing with and on whether it is followed by TO.
_distributeNodes(self)
Distribute the item's information over the lists in the
VChunkFeaturesLists.
_generate_features_list(self)
_initialize_lists(self)
Initializes the lists that contain items (Tokens) of the chunk. Since
one chunk may spawn more than one VChunkFeatures instance, these
lists are actually lists of lists.
_initialize_nodes(self)
Given the VerbChunk or a list of Tokens, set the nodes variable to
either the daughters of the VerbChunk or the list of Tokens. Also sets
node and tokens, where the first one has the VerbChunk or None (this is
so we can hand the chunk to VChunkFeatures instance, following
ChunkFeatures behaviour), and where the second one is the list of Tokens
or None.
_item_is_followed_by_TO(self, tempNodes, itemCounter)
Return True if one of the next two tokens is TO, return False otherwise.
_treatMainVerb(self, item, tempNodes, itemCounter)
Add a main verb to the trueChunks list. That is all that is done when the
item is followed by adverbs only. In other cases, we have a chunk which
has two subchunks and _updateChunkLists is called to introduce the
second chunk. This is to deal with cases like 'might consider filing',
where we want to end up with two events.
_updateChunkLists(self)
Append an empty list to the end of all lists maintained in the
VChunkFeaturesList and update the counter.
module functions
debug(text, newline=True)
getPOSList(constituents)
Returns a list of parts-of-speech from the list of constituents, typically
the constituents are instances of NounChunk, VerbChunk or Token. Used for
debugging purposes.
getWordList(constituents)
Returns a list of words from the list of constituents, typically the
constituents are instances of NounChunk, VerbChunk or Token. Used for
debugging purposes.
getWordPosList(constituents)
Returns a list of word/POS for all constituents.