0

A part of speech tagger identifies the nouns, verbs, adjectives, etc. in a sentence. There are several programs that do this online, an example being https://parts-of-speech.info.

I am looking for a package that does this for LaTeX documents, which include mathematics equations. Ideally the verbs in the outputted pdf would be highlighted in one colour, the nouns in another etc. It appears this question on Stack overflow has had similar intentions, whilst this question wants to do the same thing with nominalizations.

My question, is does such a package exist?

  • 1
    Using PythonTeX and packages such as NLTK it should be theoretically possible, I guess. –  Jan 31 '21 at 08:43
  • Using LuaLaTeX with a Lua package for PoS tagging may work too, for example https://github.com/torch/senna. – Marijn Jan 31 '21 at 09:44
  • That seems beyond the basic capability of LaTeX. You would need an exhaustive dictionary that contains each word's part of speech (of course, some words can be several). Then, on the fly, LaTeX would have to consult the dictionary for each new word of input, trying to find a match. It might be doable if you could limit the number of words to address, but each word to be processed would need an entry in some sort of part-of-speech database. – Steven B. Segletes Feb 04 '21 at 10:22
  • 1
    There is one alternative, in which, instead of trying to link in to a parts-of-speech dictionary, you could specify it as part of the input stream: [s]I [v]bought [i]Mom [d]flowers, where [s] is subject, [v] is verb, etc. Then, your task would be to run this input stream through an environment which would strip off the parts of speech and change the color of the next word on that basis. That sort of defeats the purpose of the question, but if there is interest in that, I am sure I could throw something together to demonstrate the idea – Steven B. Segletes Feb 04 '21 at 10:23

0 Answers0