I have a dataset under this form : sentence = {sentence1, sentence2, sentence3,...}
and a list of label : label = {1,2,3,4,5,6,7...} both have same length !
So I want to make an association between my first and second list as sentence1 = label 1
So I tried this code
ruleData = Table[Rule[sentence,label], {i, 1, Length[sentence]}]
so as when I recall ruleData[[7,1]], it should give me : sentence7 (as label7 is 7)
but my code is not working... Any hints ?
sentence[[i]]etc in your table? p.s. checkAssociationThread[label, sentence]– Kuba Apr 08 '19 at 13:23I want now to apply TFIDF to this variable ruleData and hence use this code : TFIDF = FeatureExtraction[ Join[First /@ Keys@ruleData[[All]], Last /@ Keys@ruleData[[All]]], "TFIDF"] but the output is telling me that there is nonatomic expression. As I wanted to apply the tf-idf for each sentence and for the total of sentences ... Any ideas of how to solve it ?
– Tom Peterson Apr 08 '19 at 13:37Keysout of your association and then mappingFirstover it. Keys are usually atomic expressions (numbers, strings, etc.) and you cannot applyFirstand/orLastto atoms. It's difficult to understand what you're trying to do from a comment like this. – Sjoerd Smit Apr 08 '19 at 13:45puting the number of times that a word appears in a sentence (term frequency) in relation to the number of times that that word appears in all other sentences (document frequency) or in other words counting the times a word appears on a given sentence but reducing its importance if it appears on many other sentences
– Tom Peterson Apr 08 '19 at 13:59ruleData = AssociationThread[{sentence1, sentence2, sentence3}, {1, 2, 3}]. You may want to flip those around so can accessruleData[3]to returnsentence3. To generate the TF-IDF, you can use GroupBy's to group documents, and words once they are tokenized/normalized. Please ask as a separate question with a minimal dataset. – alancalvitti Apr 08 '19 at 14:06