index
module components.common_modules.chunks
Chunk
NounChunk
VerbChunk
Implements the behaviour of chunks.
Chunks are embedded in sentences and contain event tags, timex tags and
instances of Token.
Much of the functionality of Evita and Slinket is delegated to chunks.
class Chunk
Inherits from: Constituent
Implements the common behaviour of chunks. Chunks are embedded in sentences
and contain event tags, timex tags and tokens.
Instance variables (in addition to the ones defined on Constituent)
phraseType - string indicating the chunk type, either 'vg' or 'ng'
head = -1 - the index of the head of the chunk
features = None - an instance of NChunkFeatures or VChunkFeatures
features_list = [] - a list of VChunkFeatures, used for verb chunks
event = None - set to True if the chunk contains an event
eid = None - set to an identifier if the chunk contains an event
eiid = None - set to an identifier if the chunk contains an event
checkedEvents = False
Some of these variables are set to a non-default value at initialization,
but most of them are filled in during processing. The variables event, eid
and eiid are generated during TarsqiTree construction, they are all None
when a tree is created for Evita, but can have values for components later
in the pipeline.
Public Functions
__init__(self, phraseType)
embedded_event(self)
Returns the embedded event of the chunk if it has one, returns None
otherwise. It is used to get the events for slinket
feature_value(self, name)
Used by the matcher and needs cases for all instance variables used in the
pattern matching phase. A similar method is used on Token.
getHead(self)
Return the head of the chunk (by default the last element).
isChunk(self)
Returns True.
pretty_print(self, indent=0)
Private Functions
_conditionallyAddEvent(self, features=None)
Perform a few little checks on the head and check whether there is
an event class, then add the event to the tree. When this is called on
a NounChunk, then there is no GramChunk handed in and it will be
retrieved from the features instance variable, when it is called from
VerbChunk, then the verb's features will be handed in.
_conditionally_add_imported_event(self, imported_event)
Create an event from the imported event, mixing information found in
the chunk and in the imported event. Added from the imported event is
the class (which means we potentially move away from the TimeML event
classes) and the begin and end of the imported event which we store in
the new 'full-range' feature, which is needed because Evita assumes
events are all one-token.
_getHeadText(self)
Get the text string of the head of the chunk. Used by
matchConstituent.
class NounChunk
Inherits from: Chunk
Behaviour specific to noun chunks, most notably the NounChunk specific
code to create events.
Public Functions
__init__(self)
createEvent(self, verbfeatures=None, imported_events=None)
Try to create an event in the NounChunk. Checks whether the nominal
is an event candidate, then conditionally adds it. The verbfeatures
dictionary is used when a governing verb hands in its features to a
nominal in a predicative complement. The imported_events is handed in
when Tarsqi tries to import events from a previous annotation.
head_is_common_noun(self)
Returns True if the head of the chunk is a common noun.
head_is_noun(self)
Returns True if the head of the chunk is a noun.
isDefinite(self)
Return True if self includes a Token that is a POS, PRP$ or a definite
determiner.
isEmpty(self)
Return True if the chunk is empty, False otherwise.
isNounChunk(self)
Returns True
Private Functions
_get_imported_event_for_chunk(self, imported_events)
Return None or a Tag from the imported_events dictionary, only return
this tag is its span is head final to the chunk and it span is at least
including the chunk head.
_passes_semantics_test(self)
Return True if the nominal can be an event semantically. Depending
on user settings this is done by a mixture of wordnet lookup and using a
simple classifier.
_passes_syntax_test(self)
Return True if the nominal is syntactically able to be an event,
return False otherwise. An event candidate syntactically has to have a
head which cannot be a timex and the head has to be a either a noun or a
common noun, depending on the value of INCLUDE_PROPERNAMES.
_run_classifier(self, lemma)
Run the classifier on lemma, using features from the GramNChunk.
class VerbChunk
Inherits from: Chunk
Public Functions
__init__(self)
createEvent(self, imported_events=None)
Try to create one or more events in the VerbChunk. How this works
depends on how many instances of VChunkFeatures can be created for
the chunk. For all non-final and non-axiliary elements in the list,
just process them as events. For the chunk-final one there is more work
to do.
dribble(self, header, text)
Write information on the sentence that an event was added to.
isNotEventCandidate(self, features)
Return True if the chunk cannot possibly be an event. This is the place
for performing some simple stoplist-like tests.
isVerbChunk(self)
Return True.
Private Functions
_createEventOnBe(self, features, imported_events=None)
_createEventOnBecome(self, features)
_createEventOnContinue(self, features)
_createEventOnDoAuxiliar(self, features)
_createEventOnFutureGoingTo(self, features)
_createEventOnHave(self, features)
_createEventOnKeep(self, features)
_createEventOnModal(self)
Try to create an event when the head of the chunk is a modal. Check
the right context and see if you can extend the chunk into a complete
verb group with modal verb and main verb. If so, process the merged
constituents as a composed verb chunk.
_createEventOnOtherVerb(self, features)
_createEventOnPastUsedTo(self, features)
_createEventOnRightmostVerb(self, features, imported_events=None)
_getRestSent(self, structure_type)
Obtain the rest of the sentence as a list of tokens if structure_type is
'flat' and as a list of constituents if structure type is 'chunked'. Log a
warning and return a list of constituents for an unknown structure type.
_identify_substring(self, sentence_slice, fsa_list)
Similar to Constituent._identify_substring(), except that this method
calls acceptsSubstringOf() instead of acceptsShortestSubstringOf(). In
some tests, for example in evita-test2.sh, this version results in a
small number of extra events.
_lookForMultiChunk(self, FSA_set, structure_type='flat')
Returns the prefix of the rest of the sentence is it matches one of
the FSAs in FSA_set. The structure_type argument specifies the
structural format of the rest of the sentence: either a flat,
token-level representation or a chunked one. This method is used for
finding specific right contexts of verb chunks.
_processDoubleEventInMultiAChunk(self, features, substring)
Tagging EVENT in both VerbChunk and AdjectiveToken. In this case the
adjective will not be given the verb features.
_processEventInMultiAChunk(self, features, substring)
_processEventInMultiNChunk(self, features, substring, imported_events)
_processEventInMultiVChunk(self, substring)
module functions
update_event_checked_marker(constituent_list)
Update Position in sentence, by marking as already checked for EVENT the
Tokens and Chunks in constituent_list. These are constituents that are
included in a chunk where an event was found.