Random sampling of records in a data table
633461Apr 11 2008 — edited Apr 11 2008I had a doubt regarding random sampling of data .....it goes as such.
i have a table of multiple records and i want to frame a query which picks random 10% samples/records from the table and displays it.
there are 3 diff queries related to that...whic one is efficient and the right one .please guide.
1)
SQL> select * from EMP sample(10);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
SQL> select * from EMP sample(50);
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 7902 17-DEC-80 800 20
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7566 JONES MANAGER 7839 02-APR-81 2975 20
7782 CLARK MANAGER 7839 09-JUN-81 2450 10
7839 KING PRESIDENT 17-NOV-81 5000 10
7876 ADAMS CLERK 7788 23-MAY-87 1100 20
7934 MILLER CLERK 7782 23-JAN-82 1300 10
7 rows selected.
In the above execution of the query, we use the sample clause to randomly sample some percentage of data. SAMPLE(n) – n is the number denoting the percentage.
One thing which I read about this clause is it works only for single table queries on local tables. If we want to use this clause in a multi table or remote query then we have to create inline view on the driving table of the query with sample clause.
SELECT T1.DEPT,T2.EMP
FROM (SELECT * FROM DEPT SAMPLE(5)) T1,EMP T2
WHERE T1.DEPT_ID=T2.DEPT_ID;
2)
With reference to the last query mentioned in the 1st query,it worked (but only for the 1st run) !
Consecutive runs - no rows are selected
SQL> SELECT d.deptno, e.ename
2 FROM (SELECT * FROM dept SAMPLE(5)) d, emp e
3 WHERE d.deptno=e.deptno
4 ;
DEPTNO ENAME
---------- ----------
10 CLARK
10 KING
10 MILLER
3)
Is there any other way to do it without using the sample function?
This is another query;
select * from emp where rownum<=(select trunc(.20*count(*),0) from emp)
but it samples only the first 20% records.
Can we put some condition on rownum and make the output random.