Limitations of Oracle’s SAMPLE Clause When Selecting a Sample of Rows

In the IT world, I’ve often said that cost is always the same, it’s how you decide where to place your cost. When the Oracle dog-n-poniers bark about great new functionality and nice tricks to help you save time and hair follicles, I know there’s almost always a catch. The 1%-5% of the time there is no catch pays for the cost of Oracle, and the remaining 95%-99% is spent on developing the workarounds when you find out the trick is really no trick (or has bugs!).

The SAMPLE clause is no different. It’s also not new (I believe it came out in Oracle 8i?), but I recently was playing around with it in hoping that it could save time in validating data for some application changes I was making.

Using SAMPLE to Select a Percentage of Data in a Table

First, look it up in the manual … http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#i2065953

… and here it is in action. I want 10% of the rows in HR.EMPLOYEES:

select employee_id
from hr.employees
sample (10)
/
EMPLOYEE_ID
———–
122
141
159
186
197

Great! But that’s about where the greatness ends.

SAMPLE Limitation 1: Views/Complex Joins

You may not be able to use it in queries against views and/or in complex joins. It may require, and rightly so when you think about it, the underlying tables having primary keys so Oracle has some basis of figuring out how to do what you’re asking for. See below. First, I create copies of sample tables hr.employees and hr.jobs, create primary keys on each, and create a view joining the two. The SAMPLE clause works when selecting against the view. Second, I drop the primary key constraints. The SAMPLE clause does not work with the PK constraints gone.

create table my_employees
as
select * from hr.employees
/
Table created.

ALTER TABLE my_employees ADD (
CONSTRAINT MY_EMP_ID_PK
PRIMARY KEY
(EMPLOYEE_ID)
USING INDEX
TABLESPACE EXAMPLE
PCTFREE 10
)
/
Table altered.

create table my_jobs
as
select * from hr.jobs
/
Table created.

ALTER TABLE my_jobs ADD (
CONSTRAINT MY_JOB_ID_PK
PRIMARY KEY
(JOB_ID)
USING INDEX
TABLESPACE EXAMPLE
PCTFREE 10
)
/
Table altered.

create or replace view inferior_vw
as
select a.employee_id,b.job_id
from
my_employees a
,my_jobs b
where 1=1
and a.job_id=b.job_id
/
View created.

select * from inferior_vw sample (5)
/

EMPLOYEE_ID JOB_ID
———– ———-
122 ST_MAN
127 ST_CLERK
146 SA_MAN
151 SA_REP
181 SH_CLERK
192 SH_CLERK

alter table my_employees drop constraint MY_EMP_ID_PK
/
Table altered.

alter table my_jobs drop constraint MY_JOB_ID_PK
/
Table altered.

select * from inferior_vw sample (5)
/
select * from inferior_vw sample (5)
*
ERROR at line 1:
ORA-01445: cannot select ROWID from, or sample, a join view without a key-preserved table

SAMPLE Limitation 2: Discrepancies in Actual Percentage of Data Returned

I might be able to live with limitation #1, but the discrepancy limitation drove me up a wall. I think it’s unreasonable, but maybe not to others, so I offer my demonstration and let you be the judge for your own needs.

The following shows an anonymous pl/sql block where I asked for a sample of 10%, but the actual number of rows returned varied (greatly IMHO), even in just 10 attempts.

declare
i number;
c number;
total number;
begin
select count(1)
into total
from hr.employees
;
for i in 1..10
loop
select count(1)
into c
from hr.employees sample (10)
;
dbms_output.put_line(round((c/total)*100,0)||’%’);
end loop;
end;
/
9%
7%
9%
15%
9%
10%
11%
12%
18%
9%

Long story short (and maybe I expect too much), I’m too impatient to deal with the inferior SAMPLE clause. I have a tried and true bag of tricks to help me randomly selecting a percentage of data for purposes of validating application code changes. That will be for another post.

Hope some of this is of help. Enjoy!

Learning to be patient,
Walt Silva

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: