1

I would like to prefetch all entity types. How can I speed up computations by prefetching all entity types? This question is unique from other questions because I am interested in the exact mechanisms and steps to prefetch all entity types and how long it might take.

Peter Burbery
  • 1,695
  • 4
  • 15

1 Answers1

4

Use EntityValue[] to list all entities.

EntityValue[]

Note that the documentation guide page EntityTypes does not include all entity types, for example, goat breed as GoatBreed. EntityPrefetch does not have the attribute Listable. To verify this, evaluate

MemberQ[Attributes[EntityPrefetch], Listable]

This will take a long time, so you probably want to open a separate instance of Mathematica to evaluate this in the background without the evaluation causing kernel blocking. There might be a way to do this in one installation with multiple kernels, but I just use a second instance of Mathematica running at the same time as the one I'm working in. You can use EchoTiming or Timing to display how long it will take.

Some entities like Star and Food and Flight return $Failure with EntityPrefetch. I suspect the reason for this might be that they rely on real-world real-time data in some way, so you can't prefetch things. For example, the data for FlightData which returns Flight Entities introduced in 13 is from the Federal Aviation Administration, which must have some sort of API endpoint that Mathematica is calling.

Here is a way to prefetch all entity types.I'm not sure exactly where the files are stored in. I think they are stored in either $BaseDirectory or $UserBaseDirectory. One thing I will mention is the Plant entities take up something like 24 GB. You might want to use

Complement[EntityValue[], {"Plant"}]

in this case instead of EntityValue[] if plants are not needed.

Here is an example with just two entities, FictionalPlace (like Naboo in Star Wars) and FictionalSpecies like house-elf in Harry Potter:

AssociationMap[
 input |-> Timing[EntityPrefetch[input]],   
 {"FictionalSpecies", "FictionalPlace"}
]

This returns

<|"FictionalSpecies" -> {0.03125, Success[
   "Prefetch", <|"MessageTemplate" -> "Prefetch successful.", 
     "Values" -> 510, "Type" -> "FictionalSpecies"|>]}, 
 "FictionalPlace" -> {0.015625, Success[
   "Prefetch", <|"MessageTemplate" -> "Prefetch successful.", 
     "Values" -> 644, "Type" -> "FictionalPlace"|>]}|>

enter image description here

You can use EchoTiming to figure out the total time:

EchoTiming[
 AssociationMap[
  input |-> Timing[EntityPrefetch[input]], {"FictionalSpecies", 
   "FictionalPlace"}], "Total time to prefetch all entities"]

enter image description here To do this will all entities, you could do something like

EchoTiming[
 AssociationMap[
  input |-> Timing[EntityPrefetch[input]], EntityValue[]], "Total time to prefetch all entities"]

enter image description here

The entity prefetch is finished.

enter image description here

Here is some data on what took the longest: ![enter image description here

Out of 351 entity types, 49 failed and 302 were successful: enter image description here Here are the entity types with the most values: enter image description here

It seems that a little less than half of the entities you can't prefetch have dedicated Wolfram Language functions like FlightData.

enter image description here Maybe that's why you can't prefetch some of them. I'm not sure why you can't prefetch forest, for example.

Peter Burbery
  • 1,695
  • 4
  • 15