TARSQI Toolkit - GUTime

The GUTimeWrapper grabs sentences and lexical items from the TarsqiDocument and creates the input needed by components/gutime/TimeTag.pl, which is the wrapper around TempEx.pm in the same directory. The input required by TimeTag.pl is a file with content as follows:

<DOC>
<DATE>20160102</DATE>
<s>
   <lex id="l1" begin="1" end="5" pos="NNP">Fido</lex>
   <lex id="l2" begin="6" end="11" pos="NNS">barks</lex>
   <lex id="l3" begin="12" end="14" pos="IN">on</lex>
   <lex id="l4" begin="15" end="21" pos="NNP">Monday</lex>
   <lex id="l5" begin="21" end="22" pos=".">.</lex>
</s>
</DOC>

The DOC root and the DATE tag are required, the latter being the way that the DCT is handed to GUTime. Otherwise, only s and lex tags are allowed. GUTime does not require the lex tags to have the begin and end attributes, but it is okay for them to be there. Any kind of spacing between the tags is allowed. The wrapper creates the above text and then uses the Python subprocess module to run the Perl script, piping in the text. The output is exactly like the input except that TIMEX3 tags are added:

<DOC>
<DATE><TIMEX3 VAL="20160102">20160102</TIMEX3></DATE>
<s>
   <lex id="l1" begin="1" end="5" pos="NNP">Fido</lex>
   <lex id="l2" begin="6" end="11" pos="NNS">barks</lex>
   <lex id="l3" begin="12" end="14" pos="IN">on</lex>
   <TIMEX3 tid="t1" TYPE="DATE"><lex id="l4" begin="15" end="21" pos="NNP">Monday</lex></TIMEX3>
   <lex id="l5" begin="21" end="22" pos=".">.</lex>
</s>
</DOC>

Similar to what happened with the preprocessor results, the new TIMEX3 tags are exported to the tags TagRepository on the TarsqiDocument. One difference is that the GUTimeWrapper adds tags using the add_tag method on TagRepository, which also adds the tag to the opening_tags and closing tags dictionaries, so a separate invocation of index() is not needed.