# Creating Tags

This document has some notes on how Tags are created, with particular attention
to when and how identifiers are added.

There are two kinds of Tags: 

- components.common_modules.tags.Tag, with subtypes EventTag and TimexTag
- docmodel.document.Tag, with subtypes OpeningTag and ClosingTag

This document is about the second one.

Tags are initialized in several spots:

- TagRepository.merge()
- SourceParserTTK
- PreprocessorWrapper
- TagRepository.add\_tag()
- create\_tarsqi\_tree()


#### docmodel.document.TagRepository.merge()

This is used when an XML document is parsed and source tags are added to the
SourceDoc. There is a method SourceDoc.finish() that will call merge() which
takes the OpeningTags and ClosingTags and merges them into Tags, taking the
identifier from the OpeningTag.

This merging has to be done because initially the XML parser in SourceParserXML
triggers invocation of add\_opening\_tag() and add\_closing\_tag(), which add
OpeningTags and ClosingTags to a temporary list of tags as found by the parser.

Tags added this way used to always have an `id` attribute, but this seemed
useless and `id` attributes are not added anymore.


#### docmodel.source_parser.SourceParserTTK

This loads the DOM and then adds DOM Nodes as Tags to the TarsqiDocument or
SourceDoc using \_add\_to\_tag\_repository(), which calls add_tag().


#### components.preprocessor.wrapper.PreprocessorWrapper

Tags are created when the PreProcessor wrapper exports its results to the
TagRepository on the TarsqiDocument. Tag identifiers in the `id` attribute are
created in the wrapper module by the TagId class, which maintains counters for
s, ng, vg and lex tags (where ng and vg tags share a counter). Tags are also
added by the TokenizerWrapper (for s and lex tags) and the ChunkerWrapper (for
ng and vg tags), but not by the TaggerWrapper because part-of-speech and lemma
are added to existing Tags.


#### docmodel.document.TagRepository.add_tag()

Used by processing components, the SourceParserTTK class, the conversion code in
utility.convert and convenience methods on TarsqiDocument. If identifiers are
added they are generated by upstream code that creates the attributes dictionary
for the Tag.


#### components.common_modules.tree.create\_tarsqi\_tree()

Uses a Tag with just begin and end offsets as one step in the process of
building an instance of TarsqiTree. No other Tags are created, but note that the
Node objects contain Tag instance that were created earlier.


