Good afternoon. I need to select a specific number of random rows from a table, and while I believe my logic is right it's taking too long, 30 minutes for a routine data size. Hopefully someone here can show me a more efficient query. I've seen the SAMPLE function, but it just randomly selects rows on a one-by-one basis, without a guaranteed total count.
This is the idea:
INSERT INTO Tmp_Table (Value, Sequence) SELECT Value FROM Perm_Table, DBMS_RANDOM.VALUE;
SELECT Value FROM Tmp_Table WHERE ROWNUM <= 1234 ORDER BY Sequence;
I'd need to put the ORDER BY in a subselect for ROWNUM to work correctly, but anyway that's just an illustration. My actual need is a little more complicated. I have many sets of data; each set has many rows; and for each set I need to return a specific (different) number of rows. Perhaps project A has three rows in this table, and I want to keep two of them; project B has two rows, and I want to keep one of them. So I need to identify, for each row, whether it's valid for that project. This is what my data looks like:
Project Person Sequence Position Keeper
A Bill 1234 1 Yes
A Fred 5678 3 No
A George 1927 2 Yes
B April 5784 2 No
B Janice 2691 1 Yes
I populate Sequence with random values, then calculate the position of each person within their project, and finally discard people who's Position is greater than Max_Targets for the Project. Fred and April have the highest random numbers, so they're cut. It's not the case that I'm just trimming one person from each project; the actual percentage kept will range from zero to 100.
Populating the list with random values is not time-consuming, but calculating Position is. This is my code:
UPDATE Tmp_Targets T1
SET Position =
(
SELECT
COUNT(*)
FROM
Perm_Targets PT1
INNER JOIN Perm_Targets PT2 ON PT1.Project = PT2.Project
INNER JOIN Tmp_Targets T2 ON PT2.Target = T2.Target
WHERE
T1.Target = PT1.Target
AND T2.Sequence <= T1.Sequence
);
The Target fields are PKs, and the Project and Sequence fields are indexed. Is there a better way to approach this? I could write a cursor that pulls out project codes and performs the above operations for each project in turn; that would be logically simpler and possibly faster. Has anyone here addressed a similar problem before? I'd appreciate any ideas.
This is on 9.2, in case it matters. Thank you,
Jonathan