The Power of Analytic Functions
* 원문에서 필요한 부분만 발췌한 자료이므로 필요할 경우 원문을 확인하세요.
In this article we show the power of analytic functions on an example
: Removing Duplicate Rows.
-- 중복값이 많아서 많은 행은 삭제해야 하는 테이블 확인
select count(*), count(distinct OBJECT_NAME)
from t;
COUNT(*) COUNT(DISTINCT OBJECT_NAME)
---------- ---------------------------
1697024 5017
Here you can see that we have more than 1.6 million rows but only 5'017 unique ones. We need to delete 1'697'024 - 5'017 = 1'692'007 of my rows (lots). Let's see how speedy this can be done using analytic functions.
The normal (non Analytic Function) Approach:
-- 서브쿼리를 이용한 삭제 : 시간과 자원이 매우 많이 소요된다.
delete from t
where rowid not in (select min(rowid)
from t group by OBJECT_NAME);
This statements runs hours and hours, using nearly 100% CPU Power and even after increasing the rollback segment tablespace to 7GB, we are not getting the desired results. This is not the way to go!
For the sake of clarity we first demonstrate the deletion on a small table - we duplicate the EMP table: insert into emp (select * from emp), then we want to delete duplicate ENAME's.
The Analytic Function Approach:
The approach to removing duplicates is a little different. We would generate the set of ROWID's to delete by using analytics and then delete them, like this:
select count(*), count(distinct ENAME) from emp;
COUNT(*) COUNT(DISTINCT ENAME)
---------- ---------------------
28 14
Let's look closer to the solution - the inner query shows the groups:
SELECT ename Ename, deptno Deptno, rowid Rid,
ROW_NUMBER()
OVER (
PARTITION BY ename
ORDER BY rowid
) RNo
FROM emp
/
Ename Deptno Rid RNo
---------- ---------- ------------------ ----------
ADAMS 20 AAABrzAAEAAAAASAAK 1
ADAMS 20 AAABrzAAEAAAAASAAY 2
ALLEN 30 AAABrzAAEAAAAASAAB 1
ALLEN 30 AAABrzAAEAAAAASAAP 2
BLAKE 30 AAABrzAAEAAAAASAAF 1
BLAKE 30 AAABrzAAEAAAAASAAT 2
CLARK 10 AAABrzAAEAAAAASAAG 1
CLARK 10 AAABrzAAEAAAAASAAU 2
FORD 20 AAABrzAAEAAAAASAAM 1
FORD 20 AAABrzAAEAAAAASAAa 2
JAMES 30 AAABrzAAEAAAAASAAL 1
JAMES 30 AAABrzAAEAAAAASAAZ 2
JONES 20 AAABrzAAEAAAAASAAD 1
JONES 20 AAABrzAAEAAAAASAAR 2
KING 10 AAABrzAAEAAAAASAAI 1
KING 10 AAABrzAAEAAAAASAAW 2
MARTIN 30 AAABrzAAEAAAAASAAE 1
MARTIN 30 AAABrzAAEAAAAASAAS 2
MILLER 10 AAABrzAAEAAAAASAAN 1
MILLER 10 AAABrzAAEAAAAASAAb 2
SCOTT 20 AAABrzAAEAAAAASAAH 1
SCOTT 20 AAABrzAAEAAAAASAAV 2
SMITH 20 AAABrzAAEAAAAASAAA 1
SMITH 20 AAABrzAAEAAAAASAAO 2
TURNER 30 AAABrzAAEAAAAASAAJ 1
TURNER 30 AAABrzAAEAAAAASAAX 2
WARD 30 AAABrzAAEAAAAASAAC 1
WARD 30 AAABrzAAEAAAAASAAQ 2
Now show only the Rows which must be deleted, these are all Rows with RNo <> 1.
SELECT Ename, Deptno, Rid, RNo
FROM
(SELECT ename Ename, deptno Deptno, rowid Rid,
ROW_NUMBER()
OVER (
PARTITION BY ename
ORDER BY rowid
) RNo
FROM emp
)
WHERE RNo <> 1
/
ENAME DEPTNO RID RNO
---------- ---------- ------------------ ----------
ADAMS 20 AAABrzAAEAAAAASAAY 2
ALLEN 30 AAABrzAAEAAAAASAAP 2
BLAKE 30 AAABrzAAEAAAAASAAT 2
CLARK 10 AAABrzAAEAAAAASAAU 2
FORD 20 AAABrzAAEAAAAASAAa 2
JAMES 30 AAABrzAAEAAAAASAAZ 2
JONES 20 AAABrzAAEAAAAASAAR 2
KING 10 AAABrzAAEAAAAASAAW 2
MARTIN 30 AAABrzAAEAAAAASAAS 2
MILLER 10 AAABrzAAEAAAAASAAb 2
SCOTT 20 AAABrzAAEAAAAASAAV 2
SMITH 20 AAABrzAAEAAAAASAAO 2
TURNER 30 AAABrzAAEAAAAASAAX 2
WARD 30 AAABrzAAEAAAAASAAQ 2
Now these Rows can easily be deleted:
DELETE FROM emp
WHERE rowid IN
(SELECT Rid
FROM
(SELECT rowid Rid,
ROW_NUMBER()
OVER (
PARTITION BY ename
ORDER BY rowid
) RNo
FROM emp
)
where RNo <> 1
);
14 rows deleted.
Conclusion
Now, we go back to the huge Table: t
DELETE FROM t NOLOGGING
WHERE rowid IN
(SELECT Rid
FROM
(SELECT rowid Rid,
ROW_NUMBER()
OVER (
PARTITION BY object_name
ORDER BY rowid
) RNo
FROM t
)
where RNo <> 1
);
1692007 rows deleted.
Elapsed: 00:08:57.15
Execution Plan
----------------------------------------------------------
0 DELETE STATEMENT Optimizer=CHOOSE
1 0 DELETE OF 'T'
2 1 NESTED LOOPS
3 2 VIEW OF 'VW_NSO_1'
4 3 SORT (UNIQUE)
5 4 VIEW
6 5 WINDOW (SORT)
7 6 TABLE ACCESS (FULL) OF 'T'
8 2 TABLE ACCESS (BY USER ROWID) OF 'T'
The huge Deletion was complete after 9 Minutes instead of hours - Great!
select count(*), count(distinct OBJECT_NAME) from t;
COUNT(*) COUNT(DISTINCT OBJECT_NAME)
---------- --------------------------
5017 5017
No more duplicate Rows on OBJECT_NAME.
More Information about Analytic Functions can be found here.
출처 : http://www.akadia.com/services/ora_analytic_functions2.html