Encrypting and decrypting files in Databricks with Python is one of those foundational patterns every data engineer should have in their toolbox. It sits at the intersection of security, governance, and day‑to‑day engineering productivity, especially when you’re dealing with PII, financial data, or regulated workloads.
What is Encryption
Encryption is a data security measure used in encoding information by converting plaintext to ciphertext using a key.
In practice, encryption takes your data—like a message, file, or database record—and scrambles it so that it looks like random gibberish to anyone who intercepts it. Without the right key, that scrambled data cannot be meaningfully interpreted. When an authorized person or system needs to read the data, they use the corresponding key and decryption process to restore the plaintext.
Encryption is a core part of data security because it protects sensitive information from unauthorized access, theft, or tampering, whether the data is stored (“at rest”) or being transmitted over networks (“in transit”).
What is Encryption Key
In Cryptography, encryption is a form of encoding information.
Encryption key is an important data security and cryptography component in form of string of bits created for scrambling and unscrambling data.
When data is encrypted, the algorithm takes the original data (plaintext) and the encryption key as inputs and produces ciphertext, which looks like random gibberish. To get the original data back, the correct key must be supplied during decryption; without that key, the ciphertext should be computationally infeasible to reverse.
There are two main ways encryption keys are used:
Symmetric encryption, the same key is used to encrypt and decrypt data. Everyone who needs to read the data must share this secret key and keep it protected.
Asymmetric encryption, there is a key pair: a public key (which can be shared) and a private key (which must stay secret). Data encrypted with one can only be decrypted with the other.
The strength of an encryption key depends on its length (number of bits) and randomness. Longer, more random keys are harder to guess or brute-force, which is why secure systems generate keys using cryptographic random number generators and store them in secure key management systems rather than hard-coding them into code or configuration.
What is Decryption
Decryption is a process of converting encrypted data back to it’s original unencrypted form.
During decryption, a decryption algorithm and the correct key are applied to that ciphertext to reverse the encryption steps and recover the original data. This ensures that only authorized users or systems—those who possess the right key—can access and understand the protected information.
Encryption, keys, and why they matter
At its core, encryption is the process of converting readable data (plaintext) into an unreadable format (ciphertext) using a key. The same key (in symmetric encryption) is then used to reverse the process and restore the original content. As a data engineer, you should treat that key as the true “crown jewel”: if it’s lost, decryption is impossible; if it’s leaked, your entire encryption effort is effectively useless.
In Databricks, you typically use a Python crypto library (for example, a Fernet implementation from cryptography) to generate a random key, then store that key in a secure location rather than hard‑coding it. On shared platforms, you might temporarily write it to a file if you only have read access to DBFS, but in production you want it in a proper secret store (Databricks secrets, Key Vault, etc.), with tight access controls.
To install cryptography packages in python
pip install cryptography
Import some modules related to Cryptography

Generate Key for File Encryption

The code above will generate a key that look like this: /tmp/tmpp7yct_de
Note: If you do not have have WRITE access in databricks DBFS, create a temporary file as shown above to store key (This if you have only READ access).

Code to access DBFS file path

To specify Correct File Path (Add this if needed)

Encrypt file using the key generated



List File Directory (Add code if needed)

Decrypt the encrypted file


Parameterization – To Parameterize variables above (File_Name, File_Path and Encryption Key)
What is Parameterization in Python
Parameterization is an efficient method of reusing code and reducing code duplication.
To install cryptography packages in python
pip install cryptography
Import some modules related to Cryptography

Generate File Key


Code to access DBFS path

Specify the correct file path and name


Define File_Path, File_Name and Encryption Key

Encrypt File


Decrypt File

To see Encrypted File
print(encrypted_data)
To see Decrypted File
print(decrypted_data)
In summary, encrypting and decrypting file is a process and you need a key to decrypt and encrypt a file. Using PySpark code above in Databricks will be helpful to get this task done. You can read up difference between python and pyspark code.