CRC as a unique file signature
843811Mar 2 2007 — edited Mar 4 2007Hello,
I wasn't sure where to post this - so I am trying this forum first.
I am working on a document managment system which will handle hundreds of thousands of documents varrying in size from 20K to 2MB. There is strong potential for duplicate documents to be put into the system so what I need to do is detect if a document is already in the system without using the filename.
At first glance it looks like using a combination of CRC and datasize would work well as a 'fingerprint" of each file, but I wanted to get some expert opinion on the matter. Is the CRC for a file unique? Or could a file of exactly the same size but with different content have the same CRC?
Your help is appreciated.
-X