algorithm - Hashing and encryption technique for a huge data set containing phone numbers -
description of problem: i'm in process of working highly sensitive data-set contains people's phone number information 1 of columns. need apply (encryption/hash function on them) convert them encoded values , analysis. can one-way hash - i.e, after processing encrypted data wont converting them original phone numbers. essentially, looking anonymizer takes phone numbers , converts them random value on can processing. suggest best way process. recommendations on best algorithms use welcome.
update: size of dataset dataset huge in size of hundreds of gb.
update: sensitive sensitive, meant phone number should not part of our analysis.so, need one-way hashing function without redundancy - each phone number should map unique value --two phones numbers should not map same value.
update: implementation ?
thanks answers.i looking elaborate implementation.i going through python's hashlib library hashing, same set of steps suggested ? here link
can give me example code achieve process , preferably in python ?
generate key data set (16 or 32 bytes) , keep secret. use hmac-sha1 on data key, , base 64 encode , have random unique string per phonenumber isn't reversable (without key).
example (hmac-sha1 256bit key) using keyczar:
create random secret key:
$> python keyczart.py create --location=path_to_key_set --purpose=sign $> python keyczart.py addkey --location=path_to_key_set --status=primary
anonymize phone number:
from keyczar import keyczar def anonymize(phone_num): signer = keyczar.signer.read("path_to_key_set"); return signer.sign(phone_num)
Comments
Post a Comment