algorithm - Hashing and encryption technique for a huge data set containing phone numbers -


description of problem: i'm in process of working highly sensitive data-set contains people's phone number information 1 of columns. need apply (encryption/hash function on them) convert them encoded values , analysis. can one-way hash - i.e, after processing encrypted data wont converting them original phone numbers. essentially, looking anonymizer takes phone numbers , converts them random value on can processing. suggest best way process. recommendations on best algorithms use welcome.

update: size of dataset dataset huge in size of hundreds of gb.

update: sensitive sensitive, meant phone number should not part of our analysis.so, need one-way hashing function without redundancy - each phone number should map unique value --two phones numbers should not map same value.

update: implementation ?

thanks answers.i looking elaborate implementation.i going through python's hashlib library hashing, same set of steps suggested ? here link

can give me example code achieve process , preferably in python ?

generate key data set (16 or 32 bytes) , keep secret. use hmac-sha1 on data key, , base 64 encode , have random unique string per phonenumber isn't reversable (without key).

example (hmac-sha1 256bit key) using keyczar:

create random secret key:

$> python keyczart.py create --location=path_to_key_set --purpose=sign $> python keyczart.py addkey --location=path_to_key_set --status=primary 

anonymize phone number:

from keyczar import keyczar  def anonymize(phone_num):   signer = keyczar.signer.read("path_to_key_set");   return signer.sign(phone_num) 

Comments

Popular posts from this blog

android - getbluetoothservice() called with no bluetoothmanagercallback -

sql - ASP.NET SqlDataSource, like on SelectCommand -

ios - Undefined symbols for architecture armv7: "_OBJC_CLASS_$_SSZipArchive" -