How to Encrypt and Decrypt File in Python Using Databricks

What is Encryption

Encryption is a data security measure used in encoding information by converting plaintext to ciphertext using a key.

What is Encryption Key

In Cryptography, encryption is a form of encoding information.

Encryption key is an important data security and cryptography component in form of string of bits created for scrambling and unscrambling data.

What is Decryption

Decryption is a process of converting encrypted data back to it’s original unencrypted form.

To install cryptography packages in python

pip install cryptography

Import some modules related to Cryptography

Generate Key for File Encryption

The code above will generate a key that look like this: /tmp/tmpp7yct_de

Note: If you do not have have WRITE access in databricks DBFS, create a temporary file as shown above to store key (This if you have only READ access).

Code to access DBFS file path

To specify Correct File Path (Add this if needed)

Encrypt file using the key generated

List File Directory (Add code if needed)

Decrypt the encrypted file

Parameterization – To Parameterize variables above (File_Name, File_Path and Encryption Key)

What is Parameterization in Python

Parameterization is an efficient method of reusing code and reducing code duplication.

To install cryptography packages in python

pip install cryptography

Import some modules related to Cryptography

Generate File Key

Code to access DBFS path

Specify the correct file path and name

Define File_Path, File_Name and Encryption Key

Encrypt File

Decrypt File

To see Encrypted File

print(encrypted_data)

To see Decrypted File

print(decrypted_data)

In summary, encrypting and decrypting file is a process and you need a key to decrypt and encrypt a file. Using PySpark code above in Databricks will be helpful to get this task done. You can read up difference between python and pyspark code.

Subscribe to this channel for future updates on data science topics