AWS Lambda Function
# encrypt_caesar_cipher.py
def enhanced_caesar_cipher(text, shift):
"""
Encrypts a string using an improved Caesar cipher.
Args:
text: The string to be encrypted.
shift: The number of positions to shift the characters.
Returns:
The encrypted string.
"""
result = ""
for char in text:
# Encrypt all printable ASCII characters (32 to 126)
if 32 <= ord(char) <= 126:
result += chr((ord(char) - 32 + shift) % 95 + 32)
else:
# Keep non-printable characters as is
result += char
return result
def lambda_handler(event, context):
"""
AWS Lambda handler function for encrypting multiple strings in batch.
"""
encrypted=[]
# iterate over each row in the batch
for row in event["data"]:
index = row[0] # the virtual index
value = str(row[1]) # The first column in the ON clause
# the result row contains the virtual index and the encrypted result
encrypted.append([index, enhanced_caesar_cipher(value, 3)])
return {'data': encrypted}
UDCF Execution using seller_name as first input argument
SELECT count(*) FROM CaesarEncrypt(
ON (SELECT seller_name FROM INDB_freddiemac_rows.loan_info_1M)
) AS EN_DATA;
Count(*) 1000000
The CaesarEncrypt table operator UDCF takes seller_name as the first input argument from the input table loan_info_1M. The lambda_handler function for the table operator also follows the Data Format but there can be multiple rows and multiple fields. The first element of the row is a virtual index value. In this example, the seller_name column is the second element of the row. There is no OUTPUT clause in the query, so the result set schema is the same as the input (a single character column of size 20). An equivalent query with an explicit OUTPUT clause looks like:
SELECT count(encrypted_seller_name) FROM CaesarEncrypt(
ON (SELECT seller_name FROM INDB_freddiemac_rows.loan_info_1M)
USING
OUTPUT('[{"encrypted_seller_name":"char(20)"}]')
) AS EN_DATA;
Count(*) 1000000
The columns in the ON clause get sent to the lambda_handler’s event["data"] in position order.
UDCF Execution using Encryption
In this example, multiple columns are encrypted and the result is joined back to the table using the unique id column.
First, modify the table operator so that it passes the ID field first and then the columns to be encrypted. This lets you pass the ID column back in the result and use it to join to the original table:
SELECT top 10 * FROM CaesarEncrypt(
ON (SELECT cast(id as CHAR(6)), seller_name, loan_sequence_number
FROM INDB_freddiemac_rows.loan_info_1M)
USING
OUTPUT('[{"id":"char(6)"}, {"encrypted_seller_name":"char(20)"}, {"encrypted_loan_sequence_number":"varchar(18)"}]')
) AS EN_DATA;
Add ID (a BIGINT) and loan_sequence_number to the ON clause input to the table operator. The ID is cast as a char(6) since the table operator currently does not support BIGINT types. The OUTPUT clause also has the return schema of the table operator.
Pass back the ID unmodified, the encrypted seller_name and then the encrypted loan_sequence_number.
Next, modify the AWS Lambda function to handle the new input and output constraints:
# encrypt_caesar_cipher.py
def enhanced_caesar_cipher(text, shift):
"""
Encrypts a string using an improved Caesar cipher.
Args:
text: The string to be encrypted.
shift: The number of positions to shift the characters.
Returns:
The encrypted string.
"""
result = ""
for char in text:
# Encrypt all printable ASCII characters (32 to 126)
if 32 <= ord(char) <= 126:
result += chr((ord(char) - 32 + shift) % 95 + 32)
else:
# Keep non-printable characters as is
result += char
return result
def lambda_handler(event, context):
"""
AWS Lambda handler function for encrypting multiple strings in batch.
"""
encrypted=[]
# iterate over each row in the batch
for row in event["data"]:
index = row[0] # the virtual index
id_col = row[1] # the id
# build the current output row according to Data Format and OUTPUT clause
output_row = [index, id_col]
# encrypt the rest of the columns in the ON clause
for col in row[2:]:
output_row.append(enhanced_caesar_cipher(str(col), 3))
# the result row contains the virtual index and the encrypted result
encrypted.append(output_row)
return {'data': encrypted}
Finally, JOIN the result back to the original table using the unique ID column:
SELECT top 10 t.id, t.seller_name, et.encrypted_seller_name, t.servicer_name, et.encrypted_servicer_name
FROM SELECT * FROM CaesarEncrypt(
ON (SELECT cast(id as CHAR(6)), seller_name, servicer_name
FROM INDB_freddiemac_rows.loan_info_1M)
USING
OUTPUT('[{"id":"char(6)"}, {"encrypted_seller_name":"char(20)"}, {"encrypted_servicer_name":"char(20)"}]')
) AS et
JOIN INDB_freddiemac_rows.loan_info_1M as t
ON et.id = t.id
ORDER BY 1;