1

I'm looking for datasets for evaluating algorithms for finding top-k on data streams (e.g.).

I currently have network trace from Caida, and some self-generated zipf i.i.d. distributed datasets.

I'm looking for real-life data sets which are heavy tailed, i.e., for any fixed k, the top-k elements only consists a small portion of the stream.

Any suggestions for available datasets for academic research which are used for streaming algorithms and are heavy tailed?

philshem
  • 17,647
  • 7
  • 68
  • 170
R B
  • 111
  • 2
  • I'm voting to close this question as off-topic because it would be better answered in the OpenData SE – Dawny33 Nov 16 '15 at 16:58
  • Thanks for your comment @Dawny33. Should I delete it and re-post it there? Can we migrate it? other suggestions? –  Nov 16 '15 at 17:17
  • Yes, it can be migrated. The moderator would look into this soon :) – Dawny33 Nov 16 '15 at 17:32

0 Answers0