Tamara Wilhite is a technical writer, industrial engineer, mother of two, and published sci-fi and horror author.
Data collisions in databases occur when the same identifier is used for two different pieces or types of data. Data collisions can occur when two systems are being merged into one system or during data imports from one database into another.
When a data collision occurs, administrators or the software itself must decide which number and meta data are correct or if a new identifier must be assigned to one of the entries.
How to Prevent Data Collisions
- Assign different "seeds" used in databases to create the first part of a unique identifier. When systems are merged, all records will have different "seeds" and thus will not conflict.
- Review the rules used for resolving data collisions. While the default solution is to take the newest record as the one to keep, some records should not be over-written but combined. This is especially true for medical records, insurance records and financial records.
- Test the database merger prior to the production run and set up database monitoring tools to report data collisions. Review the data collisions that occur and whether the rules for resolving the data collision are appropriate for the database. While labor intensive, it prevents loss of information in the database merger. Testing also reduces the risk of having to migrate the data again because data collisions were not properly taken into account.
- If you are going to upgrade a database from one application to another, verify that the naming scheme that will be used for new records won't create record identifiers that could conflict with old ones - or overwrite the existing files.
- Compare the unique identifiers in two databases to be merged to look for potential data conflicts before the systems are merged. Combine the information in a single test database prior to go live. Alter records in at least one system if collisions are identified in a standardized way so that the newly renamed records don't end up with the same, new name.
- Limit the ability to create a unique identifier not based upon the "seed". When users can select generic titles like "my report 2012" or "September sales pitch", the risk of data collisions increases.
- If unique identifiers for records are tied to project names or file owners, ensure that combinations of various projects and user names cannot accidentally create a shared UID.
- Add a letter or seed to imported vendor part numbers so their part numbers do not conflict with your own part numbers. Adding additional characters to the record identifiers of the incoming system makes them unique, preventing a data collision.
- Use two or more sources of information to create a unique identifier. A unique identifier system based on a part number and cage code will rarely conflict compared to systems based on part numbers alone.
- Assign unique identifiers based on a value that will rarely over-lap. A unique identifier system based on employee numbers will not conflict as often as a system based on first and last names. A unique identifier based on social security numbers will not collide as frequently as records based on initials and birth dates.
- Ensure that new records being imported into the database don't conflict with existing record identifiers before you import the data en masse.