Tuesday, 25 February 2014

Top vs Sample

TOP VS SAMPLE

There was a question raised in my team's discussion forum as to which one of these is better to use to get a sample of the data? There were different kinds of suggestions from various online forums, but none had the reason behind why one is better than the other.
I read through TD documentation and this is what I put forth to my group:

TOP:
1. The system evaluates the top n operator after all other clauses in the SELECT request have been evaluated. (So, I think it doesn’t help in data processing time. Must only affect the I/O time)
2. It gives the ranked result only when you specify order by clause. So, If you don’t specify order by, results can be any n rows.

SAMPLE:
1. SAMPLE clause comes with a keyword ‘RANDOMIZED ALLOCATION’ that specifies whether retrieved rows are sampled randomly across the amps or proportionate to the number of qualified rows per AMP. Proportional allocation is default.
2. SAMPLE also operates on the evaluated output. Hence, it does not improve query processing time.

No comments:

Post a Comment