10

I'd like to get the top 50, the bottom 50 and an ordered list of all Wolfram Language symbols based on their "Ranks".
With

allWLS = EntityList["WolframLanguageSymbol"];

this returns the top 50

Pick[allWLS, 
 UnitStep[Replace[EntityValue["WolframLanguageSymbol", "Ranks"], 
     m_Missing -> {{Infinity, Infinity}}, {1}][[All, 1, 2]] - 51], 0]

this should return the bottom 50

maxRank = Max[Replace[EntityValue["WolframLanguageSymbol", "Ranks"], 
 m_Missing -> {{0, 0}}, {1}][[All, 1, 2]]]

Pick[allWLS, 
 UnitStep[Replace[EntityValue["WolframLanguageSymbol", "Ranks"], 
     m_Missing -> {{0, 0}}, {1}][[All, 1, 2]] - (maxRank - 51)], 1]

but returns the bottom 38 (all having an "All" rank of 4469), and

orderedWLS = 
  allWLS[[Ordering[
     Replace[EntityValue["WolframLanguageSymbol", "Ranks"], 
       m_Missing -> {{Infinity, Infinity}}, {1}][[All, 1, 2]]]]];

returns a list of all Wolfram Language symbols ordered by their "All" rank.
The bottom 50 can then be found with

orderedWLS[[-50 - # ;; -# - 1]] &@
 Count[EntityValue["WolframLanguageSymbol", "Ranks"], _Missing]

My questions are:

  1. Is there a way to get this done faster within the Entity framework? Especially using Replace feels kind of odd here. There should be a way to get for example the top 50 by using Interval[{1, 50}] directly for the Entitys, similar to the following example for "VersionIntroduced":

    WolframLanguageData[
     EntityClass[
      "WolframLanguageSymbol", {"VersionIntroduced" -> Interval[{10.2, 10.3}]}]]
    
  2. Is there a way to do this faster by circumventing the use of Entitys completely or partially (e.g. by preprocessing the downloaded data)?

Karsten7
  • 27,448
  • 5
  • 73
  • 134
  • 1
    The problem is that WolframLanguageData[] itself uses Entity[] to return results, and the ranks seem to only be accessible through Alpha or the corresponding EntityValue[]. – J. M.'s missing motivation Oct 23 '15 at 05:15
  • @J.M. I was thinking of using something like WolframLanguageData[ EntityClass[ "WolframLanguageSymbol", {"VersionIntroduced" -> Interval[{10.2, 10.3}]}]], but couldn't make it work for "Ranks". – Karsten7 Oct 23 '15 at 05:29
  • @J.M. To me the real problem (with respect to 1.) seems to be that "Ranks" are lists and not single values. – Karsten7 Oct 23 '15 at 05:42
  • 1
    I think that's because the ranks aren't for one field, but for several; e.g. WolframLanguageData["Sin", "Ranks"]. My comment was more intended to address point 2; that is; I don't see any obvious way to avoid going online just to get the function ranks. – J. M.'s missing motivation Oct 23 '15 at 05:45
  • @J.M. An example where multiple fields are no problem WolframLanguageData["Plot", "FunctionalityAreas"], WolframLanguageData[ EntityClass[ "WolframLanguageSymbol", {"FunctionalityArea", "PlottingFunctions"}]]. – Karsten7 Oct 23 '15 at 07:03
  • @Karsten @J.M. I was able to copy, paste and execute the code. Could you enlighten me as to what the ranks of the *WolframLanguageSymbol* represent? A complete guess on my part is how many hits they are getting on various web sites? This is my first exposure to Entity, one of the things I like about Stack Exchange is to broaden my horizons. – Jack LaVigne Oct 23 '15 at 16:09
  • @Jack, I interpreted them as how "popular" a function is as used in a particular field/domain. For instance, if you execute WolframLanguageData["Sin", "Ranks"], you'll see that one of the entries is StackExchange. – J. M.'s missing motivation Oct 23 '15 at 16:12
  • @J.M. Thank you for the reply. That seems reasonable, the number one rank symbol is List and the number two Rule using All as the rank. – Jack LaVigne Oct 23 '15 at 16:57
  • @JackLaVigne EntityValue["WolframLanguageSymbol", "Ranks", "Description"] returns "ranks of usage". – Karsten7 Oct 25 '15 at 08:12

3 Answers3

11

This gives 5 symbols with the highest rank in "All":

In[1]:= EntityValue["WolframLanguageSymbol", "Ranks", 
  "EntityAssociation"] // Query[TakeSmallest[5] /* Keys, "All"]

Out[1]= {Entity["WolframLanguageSymbol", "List"], 
 Entity["WolframLanguageSymbol", "Rule"], 
 Entity["WolframLanguageSymbol", "Times"], 
 Entity["WolframLanguageSymbol", "Power"], 
 Entity["WolframLanguageSymbol", "Plus"]}

It is currently not possible to get the same result without retrieving all the data because a query like

EntityList[EntityClass[type, "property" -> value]]

is executed only when value is a simple expression like a number, quantity, entity, ..., or one of a selection of operators like ContainsAny[{entity1, ...}], GreaterThan[x], ... .

Toni
  • 126
  • 1
  • 3
5

The "Corpus" qualifier can be used to generate an efficient query.

EntityList[
 EntityClass["WolframLanguageSymbol", 
  EntityProperty["WolframLanguageSymbol", "Ranks", {"Corpus" -> "All"}] -> 
   TakeSmallest[50]]]

Out

There are examples for this in the documentation here and here now.

Karsten7
  • 27,448
  • 5
  • 73
  • 134
3

One possibility is to convert the result of EntityValue[] into a Dataset[] and then perform ranking/sorting queries. Here is what I came up with:

wlranks = Dataset[Association /@ MapAt["Name" -> # &, Prepend[#2, #1] & @@@
                  DeleteMissing[EntityValue["WolframLanguageSymbol",
                                            {"Name", "Ranks"}], 1, 1], {All, 1}]]

Here's how to query the top 50 functions by their "All" ranking:

wlranks[TakeSmallestBy[#All &, 50], "Name"] // Normal
   {"List", "Rule", "Times", "Power", "Plus", "Set", "Alternatives", "Null",
    "Blank", "NoWhitespace", "Pattern", "$Failed", "CompoundExpression", "Slot",
    "Part", "Sqrt", "RGBColor", "None", "Pi", "SetDelayed", "Function", "Equal",
    "Subscript", "Automatic", "True", "Directive", "I", "Map", "Opacity",
    "RuleDelayed", "FinancialData", "GrayLevel", "Sin", "False", "If", "Hold",
    "Quantity", "ReplaceAll", "CityData", "Line", "Cos", "Condition", "Less",
    "Style", "And", "E", "Table", "HoldComplete", "Word", "Length"}

A problem I noticed with querying the bottom 50 is that there seems to be a lot of ties at the bottom. With that caveat, you can use TakeLargestBy[] to extract the bottom 50. A sorted list of the functions ranked by "All" is returned by wlranks[SortBy[#All &], "Name"] // Normal. Similar operations can be done for the other ranks, e.g. "StackExchange":

wdd[TakeSmallestBy[#StackExchange &, 50], "Name"] // Normal
   {"List", "Times", "Set", "Power", "Rule", "Blank", "Pattern", "Slot",
    "CompoundExpression", "Plus", "Part", "Function", "SetDelayed", "Map",
    "Equal", "Null", "Pi", "ReplaceAll", "Table", "Apply", "Sin", "All", "True",
    "Length", "Sqrt", "RuleDelayed", "Range", "Cos", "If", "Derivative", "False",
    "First", "Less", "Flatten", "Greater", "Module", "Plot", "Transpose",
    "PlotRange", "I", "None", "Red", "LessEqual", "And", "ImageSize",
    "StringCases", "With", "Graphics", "PatternTest", "Dynamic"}
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
  • 1
    What does this mean, the "rank" of a symbol? Just how frequently it appears in a given codebase? – Oleksandr R. Oct 25 '15 at 02:58
  • @Oleksandr, yes, that was my reading of it; taking the last one as an example, List is the most popular Mathematica function in StackExchange posts, followed by Times. – J. M.'s missing motivation Oct 25 '15 at 03:02
  • @OleksandrR. EntityValue["WolframLanguageSymbol", "Ranks", "Description"] returns "ranks of usage". One can get the "frequencies of usage" by using "Frequencies" instead of "Ranks". – Karsten7 Oct 25 '15 at 08:16
  • 1
    One can avoid making two downloads and save some time by using EntityValue["WolframLanguageSymbol", {"Name", "Ranks"}] instead of WolframLanguageData[WolframLanguageData[], {"Name", "Ranks"}]. – Karsten7 Oct 25 '15 at 08:18
  • @Karsten, I've changed it. Thanks! – J. M.'s missing motivation Oct 25 '15 at 12:24
  • @Karsten7. the problem is that I don't think "ranks of usage" really means anything. If this is what is intended, it would have been far clearer to write "rank order by frequency of usage in a given corpus". "Ranks" can easily be understood as identifying tensor-valued functions such as D, or relating to any of the higher-order functions. – Oleksandr R. Oct 25 '15 at 16:40
  • @Oleksandr, they certainly could have picked a less ambiguous name for the field. – J. M.'s missing motivation Oct 25 '15 at 16:42