I got a situation where I know I got duplicate records by EMAIL in a data table.
This is roughly how the record would look like
CUSTOMER_ID | LEGACY_ID | EMAIL | FIRSTNAME | SURNAME
New records inserted to the database since 2018 will contain a CUSTOMER_ID. All records inserted before 2018 will have a LEGACY_ID.
If a member record entered before 2018, wants to be updated to comply to the new system, their record will be updated with a CUSTOMER_ID and the LEGACY_ID will still remain.
Due to some process failures some existing members with LEGACY_ID was inserted into the data table again...hence creating another record with just CUSTOMER_ID
I need to do a SELECT query to display records that have duplicate EMAIL and only LEGACY_ID.
For example, if there are two records with same EMAIL, one has only LEGACY_ID, and the other has only CUSTOMER_ID. Then return the duplicate record with LEGACY_ID only. I want to retain the record with CUSTOMER_ID.
Any ideas how I can do this?