Emily Shea

emshea

My blog on learning AWS cloud technology & building serverless apps 🌻


Part 2: Migrating to a new DynamoDB data model with Python scripts

Projects Jul 03 2022

This blog post is Part 2 in a series on refactoring my daily Chinese vocab app to a single-table design in Amazon DynamoDB. In this post, I walk through exporting data from my existing DynamoDB table, reformatting it to fit the new data model, and importing it into a new DynamoDB table using Python scripts. While my goal was to migrate to a single-table design, this strategy works for any major change you want to make to your data model.

Check out Part 1 to learn about why I chose single-table design and my design process.

Note that the data migration strategy described here works best for applications that can handle downtime. Maybe you are still in the development phase of your application, or your application is like mine and has predictable usage patterns. If your application is in constant use and you are concerned about losing data during a migration, an approach like using DynamoDB Streams to continue capturing updates during your migration may be a better fit.


Data migration walkthrough

This is the data migration process we will walk through:

  1. Create a new DynamoDB table, while keeping the existing table.
  2. Run a script to export the existing table data to a JSON file.
  3. Run a script to load the JSON file data, complete any transformations needed, and upload the data to the new table in the new format.
  4. Verify the data was imported correctly to the new table (and maybe keep the data export from Step 2 just in case) before updating your application to read and write from the new table and deleting the existing table.

At this stage, I already have my new data model planned out (as discussed in Part 1).

I have the existing DynamoDB table with data in the old data model, and I've created a new, empty table with the primary key and secondary index configuration for my new data model. I've defined and deployed both tables in my application's infrastructure-as-code framework template (I use SAM). As a further protection against accidentally deleting tables before I'm ready to and losing data, I set the deletion policy for all my DynamoDB tables to 'Retain'.

AWSTemplateFormatVersion: '2010-09-09'
Transform: 'AWS::Serverless-2016-10-31'
Description: An example deployment for migrating data between DynamoDB tables
Resources:
ExistingTable:
Type: AWS::DynamoDB::Table
DeletionPolicy: Retain
Properties:
TableName: ExportDataTable
AttributeDefinitions:
- AttributeName: SubscriberEmail
AttributeType: S
KeySchema:
- AttributeName: SubscriberEmail
KeyType: HASH
BillingMode: PAY_PER_REQUEST
ProvisionedThroughput:
ReadCapacityUnits: 0
WriteCapacityUnits: 0
NewTable:
Type: AWS::DynamoDB::Table
DeletionPolicy: Retain
Properties:
TableName: ImportDataTable
AttributeDefinitions:
- AttributeName: PK
AttributeType: S
- AttributeName: SK
AttributeType: S
- AttributeName: GSI1PK
AttributeType: S
- AttributeName: GSI1SK
AttributeType: S
KeySchema:
- AttributeName: PK
KeyType: HASH
- AttributeName: SK
KeyType: RANGE
GlobalSecondaryIndexes:
- IndexName: GSI1
KeySchema:
- AttributeName: GSI1PK
KeyType: HASH
- AttributeName: GSI1SK
KeyType: RANGE
Projection:
ProjectionType: ALL
BillingMode: PAY_PER_REQUEST
ProvisionedThroughput:
ReadCapacityUnits: 0
WriteCapacityUnits: 0

Now I export my existing DynamoDB table data to a JSON file. In this example, we will migrate user profile data. This is the existing data model format for user data:

{
 "SubscriberEmail": "example@mail.com",
 "CharacterSet": "simplified",
 "DateSubscribed": "2021-06-11T00:00:00"
}

Here is the script I use to export my existing data to a JSON file:

import json
import boto3
# Specify your table name and the region your table is in
export_table_name = "ExportDataTable"
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table(export_table_name)
def export_table_data():
print(f"Exporting data from {export_table_name}")
response = table.scan()
data = response['Items']
# Paginate through DynamoDB table response
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
# Create or open a text file to save exported data
with open(f"data_export_{export_table_name}.json","w+") as f:
json_data = json.dumps(data)
f.write(json_data)
print("Export complete!")
export_table_data()

This is the format for user data following my new data model:

{
 "PK": "USER#123456-123-456-789-3e8876eedb2d",
 "SK": "USER#123456-123-456-789-3e8876eedb2d",
 "characterSetPreference": "simplified",
 "dateCreated": "2021-06-11T00:00:00",
 "emailAddress": "example@mail.com",
 "GSI1PK": "USER",
 "GSI1SK": "USER#123456-123-456-789-3e8876eedb2d",
 "lastLogin": "2021-12-16T00:05:46.529742",
 "userAlias": "小沈",
 "userAliasEmoji": "🦊",
 "userAliasPinyin": "xiǎo shěn"
}

And here is an example of my script to upload data in the new format. This script takes pieces of user data from the previous data model (ex: SubscriberEmail, CharacterSet) and remaps them to new fields. It also creates new empty or 'Not set' fields for data I did not previously save and which will be created by the user (ex: lastLogin, userAlias). I've included an example of how you can add additional migration or data transformation steps, in my case generating Cognito IDs for each user.

This script makes a DynamoDB PutItem API call for each user. For migrating large amounts of data, you may want to explore bulk upload options.

import json
import boto3
# Specify your existing (export) and new (import) table names and the region your tables are in
export_table_name = "ExportDataTable"
import_table_name = "ImportDataTable"
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table(import_table_name)
def import_table_data():
table_data_export = read_data_from_export_file()
# You can add functions here if you need to do any additional data transformations
# For example, for my migration I needed to generate Cognito IDs for each user
# table_data_export = generate_cognito_ids(table_data_export)
write_data_to_dynamodb(table_data_export)
def read_data_from_export_file():
# Read data from the DynamodB export file you created with the export script
table_data_export = []
with open(f"data_export_{export_table_name}.json", "r") as f:
contents = f.read()
table_data_export = json.loads(contents)
return table_data_export
def write_data_to_dynamodb(table_data_export):
failed_users_list = []
succeeded_users_count = 0
for user in table_data_export:
try:
# For each item in the export table, put a new item in the import table using the new data model structure
# Use a ConditionExpression to only put the item if a user with the same cognito_id does not already exist
response = table.put_item(
Item={
'PK': "USER#" + user['cognito_id'],
'SK': "USER#" + user['cognito_id'],
'emailAddress': user['SubscriberEmail'],
'dateCreated': user['DateSubscribed'],
'lastLogin': "",
'userAlias': "Not set",
'userAliasPinyin': "Not set",
'userAliasEmoji': "Not set",
'characterSetPreference': user['CharacterSet'],
'GSI1PK': "USER",
'GSI1SK': "USER#" + user['cognito_id']
},
ConditionExpression='attribute_not_exists(PK)'
)
print(f"Create contact in DynamoDB {user['cognito_id']}. Response: {response['ResponseMetadata']['HTTPStatusCode']}")
succeeded_users_count += 1
except Exception as e:
print(f"Error: Failed to create contact in DynamoDB, {user['cognito_id']}. Error: {e}")
failed_users_list.append(user)
if failed_users_list:
print('Failed users: ', failed_users_list)
print('Succeeded users count: ', succeeded_users_count)
print('Import completed!')
return
# Example of an additional migration step
def generate_cognito_ids(table_data_export):
# Create Cognito profile for each user and append Cognito IDs as cognito_id to user data
return table_data_export
import_table_data()

I migrated my data model in stages to make it easier to manage - first migrating user data, and then vocab list data. It was helpful to continually reference my new data model design in NoSQL Workbench. I regularly referred back to the complete data model with sample data I created during my design phase as I migrated data or wrote new application code.


I hope these data migration scripts are helpful for your projects! Check back soon for the last post in this series which will cover my approach for parsing DynamoDB single-table design data for easier handling in the application logic layer and frontend of your application.

🌻

← Back Home


Never miss a new blog post! Subscribe here to get posts delivered directly to your inbox.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.