In a set, what is the term to describe the number of unique values divided by the total number of values?

Question

The closest word I can think of would be "uniqueness" although I know there is a more specific mathematical term.

Say we have a set/table of data with two columns that describes cars. One column is VIN and the other is color.

VIN        Color
123456789  Blue
987654321  Brown
597348473  Green
789132654  Blue

In the VIN column we have four unique values divided by four total values for a result of 1. In the color column we have three unique values divided by four total for a result of 0.75.

Therefore, the VIN column has a higher level of uniqueness than the color column.

I know the concept is a purely mathematical one which is why I am posting the question here. However, it has application in database design so please forgive me for overlapping different StackExchange communities.

The concept is used to some degree when choosing effective database indexes. The idea is that you want indexes on columns with a high degree of uniqueness, so in this example VIN would be a good index candidate and color would probably be a poor candidate.

Technically, we should be considering a multiset and not a set. A set does not count multiplicity so that {1,2,2,3} and {1,2,3} are the same. However, the multisets [1,2,2,3] and [1,2,3] are different. For what it's worth, I've always referred to the quantity you suggest as "diversity". — MRicci, Jul 03 '14 at 17:32
Thanks MRicci. I should have warned in advance that I do not have a strong background in math and some terminology I use may be offensive to the knowledgeable :-) — Will, Jul 03 '14 at 17:37

score 1 · Answer 1 · edited Apr 13 '17 at 12:20

According to this question on SE, the quantity you describe is a ratio between the dimension of a multiset and its cardinality. As I said in my comment, I think a good name for this quantity is the "diversity" of the multiset.

Here is a link to info on multisets. In practice, you'll probably be working with arrays in some programming language, though you might find the multiset stuff interesting.

In a set, what is the term to describe the number of unique values divided by the total number of values?

1 Answers1