We have a few million 18S reads from a particular environment. The reads have been clustered into Operational Taxonomic Unit (OTU), and the OTUs annotated against a reference database.
To generate a rarefaction curve, my understanding is that one randomly samples $n$ reads where $n$ ranges (with some step size) from 0 to the total number of reads, and counts the number of OTUs observed at each such sub-sampling.
Which of these two ways, as implemented by sequence analysis suites such as QIIME and mothur, is standard practice? Which would be best to use with the above situation?
Treat the original assignments of reads to OTUs as truth, and when resampling $n$ reads, just count the number of "original" OTUs observed in this sub-sample.
Re-cluster the sub-sampled reads, and then count the number of "new" OTUs in the sub-sample.
My sense from reading through the QIIME documentation is that method 1 is the standard, but I am not sure. I also do not quite understand why method 2 wouldn't be the better way to go, though it would be computationally more expensive.