Hello, I'm a beginner of SQL.
Could you help me how to use 'dbms_random.seed()' ?
I want to set seed and select 100 patients' data randomly from source data set.
Source data set is like below. It contains more than 1700000 patients' data. One patient can have multiple records.
From this source data set, I want to select 100 patients randomly and extract their data with date >=2015-01-01.
The 100 patients should have drugcode=1000 with date >=2016-04-01.
PATIENTID
| DATE
| DRUGCODE |
|---|
| 111 | 2015-4-15 | 1000 |
| 111 | 2014-6-20 | 1001 |
| 111 | 2016-7-6 | 1002 |
| 331 | 2014-12-3 | 1010 |
| 331 | 2019-1-4 | 1000 |
| 999 | 2018-10-13 | 1009 |
| 999 | 2018-6-15 | 1001 |
| 999 | 2018-1-7 | 1007 |
| 411 | 2017-5-14 | 1000 |
| 411 | 2016-11-29 | 1010 |
| 201 | 2014-7-19 | 1002 |
| 201 | 2018-8-11 | 1003 |
| 101 | 2014-5-19 | 1000 |
| 151 | 2019-12-4 | 1006 |
| 151 | 2016-6-10 | 1005 |
| 151 | 2020-1-30 | 1000 |
| 888 | 2014-11-7 | 1010 |
| 888 | 2018-12-25 | 1000 |
In this example, we have 4 patients(331, 411,151,and 888) who have drug code=1000 after 2016-04-01.
Lets say we want to randomly-select 2 patients among these 4.
If the selected 2 patients were 411 and 888, the desired result is like below.
One of the records of patient 888 should be removed because date is not after 2015-01-01.
PATIENTID
| DATE
| DRUGCODE
|
|---|
| 411 | 2017-5-14 | 1000 |
| 411 | 2016-11-29 | 1010 |
| 888 | 2018-12-25 | 1000 |
I tried the following SQL in SPL PLUS but it seemed that SEED did not work.
I executed 'call' every time with the same seed number, but got data of different 100 patients.
========================================================
call dbms_random.seed(11111);
SELECT source.*
FROM source
WHERE
source.date >=2015-01-01 AND
source.patientid IN
(
SELECT b.patientid
FROM
(
SELECT rownum, a.*, dbms_random.value() RAND
FROM
(
SELECT DISTINCT source.patientid
FROM source
WHERE
source.drugcode = 1000 AND
source.date>=2016-04-01
ORDER BY source.patientid
) a
ORDER BY RAND
) b
WHERE ROWNUM<=100
)
====================================================
Strangely, when I tried the following SQL , SEED does work.
I got the same 100 PATIENTID who have drug code=1000 after 2016-04-01.
====================================================
call dbms_random.seed(11111);
SELECT b.patientid
FROM
(
SELECT rownum, a.*, dbms_random.value() RAND
FROM
(
SELECT DISTINCT source.patientid
FROM source
WHERE
source.drugcode = 1000 AND
source.date>=2016-04-01
ORDER BY source.patientid
) a
ORDER BY RAND
) b
WHERE ROWNUM<=100
========================================================
Is there any problem to put this SQL into WHERE IN () ??
I really appreciate your help.
Thank you!