Part 2: Migrating to a new DynamoDB data model with Python scripts

Projects Jul 03 2022

This blog post is Part 2 in a series on refactoring my daily Chinese vocab app to a single-table design in Amazon DynamoDB. In this post, I walk through exporting data from my existing DynamoDB table, reformatting it to fit the new data model, and importing it into a new DynamoDB table using Python scripts. While my goal was to migrate to a single-table design, this strategy works for any major change you want to make to your data model.

Check out Part 1 to learn about why I chose single-table design and my design process.

Note that the data migration strategy described here works best for applications that can handle downtime. Maybe you are still in the development phase of your application, or your application is like mine and has predictable usage patterns. If your application is in constant use and you are concerned about losing data during a migration, an approach like using DynamoDB Streams to continue capturing updates during your migration may be a better fit.

Data migration walkthrough

This is the data migration process we will walk through:

  1. Create a new DynamoDB table, while keeping the existing table.
  2. Run a script to export the existing table data to a JSON file.
  3. Run a script to load the JSON file data, complete any transformations needed, and upload the data to the new table in the new format.
  4. Verify the data was imported correctly to the new table (and maybe keep the data export from Step 2 just in case) before updating your application to read and write from the new table and deleting the existing table.

At this stage, I already have my new data model planned out (as discussed in Part 1).

I have the existing DynamoDB table with data in the old data model, and I've created a new, empty table with the primary key and secondary index configuration for my new data model. I've defined and deployed both tables in my application's infrastructure-as-code framework template (I use SAM). As a further protection against accidentally deleting tables before I'm ready to and losing data, I set the deletion policy for all my DynamoDB tables to 'Retain'.

Now I export my existing DynamoDB table data to a JSON file. In this example, we will migrate user profile data. This is the existing data model format for user data:

 "SubscriberEmail": "",
 "CharacterSet": "simplified",
 "DateSubscribed": "2021-06-11T00:00:00"

Here is the script I use to export my existing data to a JSON file:

This is the format for user data following my new data model:

 "PK": "USER#123456-123-456-789-3e8876eedb2d",
 "SK": "USER#123456-123-456-789-3e8876eedb2d",
 "characterSetPreference": "simplified",
 "dateCreated": "2021-06-11T00:00:00",
 "emailAddress": "",
 "GSI1PK": "USER",
 "GSI1SK": "USER#123456-123-456-789-3e8876eedb2d",
 "lastLogin": "2021-12-16T00:05:46.529742",
 "userAlias": "小沈",
 "userAliasEmoji": "🦊",
 "userAliasPinyin": "xiǎo shěn"

And here is an example of my script to upload data in the new format. This script takes pieces of user data from the previous data model (ex: SubscriberEmail, CharacterSet) and remaps them to new fields. It also creates new empty or 'Not set' fields for data I did not previously save and which will be created by the user (ex: lastLogin, userAlias). I've included an example of how you can add additional migration or data transformation steps, in my case generating Cognito IDs for each user.

This script makes a DynamoDB PutItem API call for each user. For migrating large amounts of data, you may want to explore bulk upload options.

I migrated my data model in stages to make it easier to manage - first migrating user data, and then vocab list data. It was helpful to continually reference my new data model design in NoSQL Workbench. I regularly referred back to the complete data model with sample data I created during my design phase as I migrated data or wrote new application code.

I hope these data migration scripts are helpful for your projects! Check back soon for the last post in this series which will cover my approach for parsing DynamoDB single-table design data for easier handling in the application logic layer and frontend of your application.


← Back Home