Heavy tailed dataset for heavy hitters problem

Asked Nov 16 '15 at 16:54

Active Jul 29 '16 at 16:26

Viewed 83 times

I'm looking for datasets for evaluating algorithms for finding top-k on data streams (e.g.).

I currently have network trace from Caida, and some self-generated zipf i.i.d. distributed datasets.

I'm looking for real-life data sets which are heavy tailed, i.e., for any fixed k, the top-k elements only consists a small portion of the stream.

Any suggestions for available datasets for academic research which are used for streaming algorithms and are heavy tailed?

edited Jul 29 '16 at 16:26

philshem

17,647
7
68
170

asked Nov 16 '15 at 16:54

R B

I'm voting to close this question as off-topic because it would be better answered in the OpenData SE – Dawny33 Nov 16 '15 at 16:58
Thanks for your comment @Dawny33. Should I delete it and re-post it there? Can we migrate it? other suggestions? – Nov 16 '15 at 17:17
Yes, it can be migrated. The moderator would look into this soon :) – Dawny33 Nov 16 '15 at 17:32

Heavy tailed dataset for heavy hitters problem

0 Answers0