index
module components.common_modules.chunks
Chunk
NounChunk
VerbChunk
Implements the behaviour of chunks.

Chunks are embedded in sentences and contain event tags, timex tags and
instances of Token.

Much of the functionality of Evita and Slinket is delegated to chunks.
class Chunk
Inherits from: Constituent

Implements the common behaviour of chunks. Chunks are embedded in sentences
and contain event tags, timex tags and tokens.

Instance variables (in addition to the ones defined on Constituent)
   phraseType         - string indicating the chunk type, either 'vg' or 'ng'
   head = -1          - the index of the head of the chunk
   features = None    - an instance of NChunkFeatures or VChunkFeatures
   features_list = [] - a list of VChunkFeatures, used for verb chunks
   event = None       - set to True if the chunk contains an event
   eid = None         - set to an identifier if the chunk contains an event
   eiid = None        - set to an identifier if the chunk contains an event
   checkedEvents = False

Some of these variables are set to a non-default value at initialization,
but most of them are filled in during processing. The variables event, eid
and eiid are generated during TarsqiTree construction, they are all None
when a tree is created for Evita, but can have values for components later
in the pipeline.

Public Functions

__init__(self, phraseType)
embedded_event(self)
Returns the embedded event of the chunk if it has one, returns None otherwise. It is used to get the events for slinket
feature_value(self, name)
Used by the matcher and needs cases for all instance variables used in the pattern matching phase. A similar method is used on Token.
getHead(self)
Return the head of the chunk (by default the last element).
isChunk(self)
Returns True.
pretty_print(self, indent=0)

Private Functions

_conditionallyAddEvent(self, features=None)
Perform a few little checks on the head and check whether there is an event class, then add the event to the tree. When this is called on a NounChunk, then there is no GramChunk handed in and it will be retrieved from the features instance variable, when it is called from VerbChunk, then the verb's features will be handed in.
_conditionally_add_imported_event(self, imported_event)
Create an event from the imported event, mixing information found in the chunk and in the imported event. Added from the imported event is the class (which means we potentially move away from the TimeML event classes) and the begin and end of the imported event which we store in the new 'full-range' feature, which is needed because Evita assumes events are all one-token.
_getHeadText(self)
Get the text string of the head of the chunk. Used by matchConstituent.
class NounChunk
Inherits from: Chunk

Behaviour specific to noun chunks, most notably the NounChunk specific
code to create events.

Public Functions

__init__(self)
createEvent(self, verbfeatures=None, imported_events=None)
Try to create an event in the NounChunk. Checks whether the nominal is an event candidate, then conditionally adds it. The verbfeatures dictionary is used when a governing verb hands in its features to a nominal in a predicative complement. The imported_events is handed in when Tarsqi tries to import events from a previous annotation.
head_is_common_noun(self)
Returns True if the head of the chunk is a common noun.
head_is_noun(self)
Returns True if the head of the chunk is a noun.
isDefinite(self)
Return True if self includes a Token that is a POS, PRP$ or a definite determiner.
isEmpty(self)
Return True if the chunk is empty, False otherwise.
isNounChunk(self)
Returns True

Private Functions

_get_imported_event_for_chunk(self, imported_events)
Return None or a Tag from the imported_events dictionary, only return this tag is its span is head final to the chunk and it span is at least including the chunk head.
_passes_semantics_test(self)
Return True if the nominal can be an event semantically. Depending on user settings this is done by a mixture of wordnet lookup and using a simple classifier.
_passes_syntax_test(self)
Return True if the nominal is syntactically able to be an event, return False otherwise. An event candidate syntactically has to have a head which cannot be a timex and the head has to be a either a noun or a common noun, depending on the value of INCLUDE_PROPERNAMES.
_run_classifier(self, lemma)
Run the classifier on lemma, using features from the GramNChunk.
class VerbChunk
Inherits from: Chunk

Public Functions

__init__(self)
createEvent(self, imported_events=None)
Try to create one or more events in the VerbChunk. How this works depends on how many instances of VChunkFeatures can be created for the chunk. For all non-final and non-axiliary elements in the list, just process them as events. For the chunk-final one there is more work to do.
dribble(self, header, text)
Write information on the sentence that an event was added to.
isNotEventCandidate(self, features)
Return True if the chunk cannot possibly be an event. This is the place for performing some simple stoplist-like tests.
isVerbChunk(self)
Return True.

Private Functions

_createEventOnBe(self, features, imported_events=None)
_createEventOnBecome(self, features)
_createEventOnContinue(self, features)
_createEventOnDoAuxiliar(self, features)
_createEventOnFutureGoingTo(self, features)
_createEventOnHave(self, features)
_createEventOnKeep(self, features)
_createEventOnModal(self)
Try to create an event when the head of the chunk is a modal. Check the right context and see if you can extend the chunk into a complete verb group with modal verb and main verb. If so, process the merged constituents as a composed verb chunk.
_createEventOnOtherVerb(self, features)
_createEventOnPastUsedTo(self, features)
_createEventOnRightmostVerb(self, features, imported_events=None)
_getRestSent(self, structure_type)
Obtain the rest of the sentence as a list of tokens if structure_type is 'flat' and as a list of constituents if structure type is 'chunked'. Log a warning and return a list of constituents for an unknown structure type.
_identify_substring(self, sentence_slice, fsa_list)
Similar to Constituent._identify_substring(), except that this method calls acceptsSubstringOf() instead of acceptsShortestSubstringOf(). In some tests, for example in evita-test2.sh, this version results in a small number of extra events.
_lookForMultiChunk(self, FSA_set, structure_type='flat')
Returns the prefix of the rest of the sentence is it matches one of the FSAs in FSA_set. The structure_type argument specifies the structural format of the rest of the sentence: either a flat, token-level representation or a chunked one. This method is used for finding specific right contexts of verb chunks.
_processDoubleEventInMultiAChunk(self, features, substring)
Tagging EVENT in both VerbChunk and AdjectiveToken. In this case the adjective will not be given the verb features.
_processEventInMultiAChunk(self, features, substring)
_processEventInMultiNChunk(self, features, substring, imported_events)
_processEventInMultiVChunk(self, substring)
module functions
update_event_checked_marker(constituent_list)
Update Position in sentence, by marking as already checked for EVENT the Tokens and Chunks in constituent_list. These are constituents that are included in a chunk where an event was found.