Merging datasets with NULL values
A MERGE may appear to work properly, but it's important you check your target dataset to make sure it does. This tip explains why and how.
The MERGE functionality introduced in Oracle9i offers a simplified, yet powerful enhancement to the commonly known UPSERT approach of earlier editions. Just as its predecessor, MERGE is a valuable functionality when integrating same datasets over time, (i.e. when extracting incremental data from a production environment data source and merging it into a holding/collection area as is often necessary for data migration projects). It allows already existing records to be updated and new records to be inserted into the holding/collection area. However, its weakness of handling NULL values has lead to inserts instead of updates, leaving us with redundant records on the one hand and outdated records on the other. Consequently, our data became unreliable. Several of our scripts and applications started throwing errors or returned erroneous results. This can be avoided by utilizing the tip I submitted.
A MERGE (which is to be done on the PK fields of the tables involved) might appear to work properly, however, carefully check your target dataset by applying unique indicies. Reason being, the moment you have NULL values in any of your PK fields (which can happen for instance when migrating less then perfect data) the comparison of these two nulls will evaluate to unknown. This will cause an insert rather than an update, leading to undesired record duplicates and failed updates. This has been tested on Oracle9i Enterprise Edition Release 184.108.40.206.0 and Oracle9i Enterprise Edition Release 220.127.116.11.0.
For example: a simple merge approach would be:
CREATE TABLE MERGE_TEST1 ( SOMEID NUMBER(2), DESC1 VARCHAR2(2), DESC2 VARCHAR2(2)); CREATE TABLE MERGE_TEST2 ( SOMEID NUMBER(2), DESC1 VARCHAR2(2), DESC2 VARCHAR2(2)); CREATE UNIQUE INDEX MERGE_TEST1_IDX ON MERGE_TEST1 (SOMEID, DESC1); INSERT INTO MERGE_TEST1 VALUES (1, '', 'A'); INSERT INTO MERGE_TEST1 VALUES (2, 'B', 'B'); INSERT INTO MERGE_TEST1 VALUES (3, 'C', 'C'); INSERT INTO MERGE_TEST1 VALUES (4, 'D', 'D'); INSERT INTO MERGE_TEST2 VALUES (1, '', 'B'); INSERT INTO MERGE_TEST2 VALUES (2, 'B', 'B'); INSERT INTO MERGE_TEST2 (SOMEID, DESC2) VALUES (3, 'C'); INSERT INTO MERGE_TEST2 VALUES (4, 'D', 'D'); MERGE INTO MERGE_TEST2 AUSING (SELECT * FROM MERGE_TEST1 ) B ON ( A.SOMEID = B.SOMEID AND A.DESC1 = B.DESC1) WHEN MATCHED THEN UPDATE SET A.DESC2 = B.DESC2 WHEN NOT MATCHED THEN INSERT( A.SOMEID, A.DESC1, A.DESC2) VALUES( B.SOMEID, B.DESC1, B.DESC2) /
However, the resuts are as follows:
SQL> select * from merge_test2; SOMEID DE DE ---------- -- -- 1 B << --- should have been updated 2 B B 3 C 4 D D 3 C C 1 A << --- should not have been inserted
In order to avoid this dilemma, you should use the function SYS_OP_MAP_NONNULL function for NULL comparisons. (This is currently NOT documented!) The enhanced MERGE statement looks as follows:
MERGE INTO MERGE_TEST2 A USING (SELECT * FROM MERGE_TEST1 ) B ON ( SYS_OP_MAP_NONNULL(A.SOMEID) = SYS_OP_MAP_NONNULL(B.SOMEID) AND SYS_OP_MAP_NONNULL(A.DESC1) = SYS_OP_MAP_NONNULL(B.DESC1)) WHEN MATCHED THEN UPDATE SET A.DESC2 = B.DESC2 WHEN NOT MATCHED THEN INSERT( A.SOMEID, A.DESC1, A.DESC2) VALUES( B.SOMEID, B.DESC1, B.DESC2) /
... and voila, yields the desired results:
SQL> select * from merge_test2; SOMEID DE DE ---------- -- -- 1 A 2 B B 3 C 4 D D 3 C C
Daniel C. writes: One correction to the article: Note the example uses a unique key constraint, which does allow NULLs. Other than that, it's a good tip, especially since it mentions an undocumented feature (SYS_OP_MAP_NONNULL), which by the way is not mentioned anywhere in the 9iR2 or 10g docs.