Database upgrades are performed in place rather than creating a new database copy
- Preserve all unknown data in extension tables
- Ensure errors and exceptions are properly generated and brought to the upgrader's attention during migration if the upgrade conflicts with an extension or any other customization
As a secondary concern, this upgrade method was chosen to ease the upgrade process for a Cloud-based ODS (e.g., on Azure).
- Users who have extension tables (or any other schema with foreign key dependencies on the
edfi schema) will be notified, and must explicitly acknowledge it by adding the
BypassExtensionValidationCheck option at the command line when upgrading. This will ensure that the installer is aware that custom upgrade scripts may be required
Sequence of events that occur during upgrade
Specifics differ for each version, but in general the upgrade sequence executes as follows
- Validate user input at the command line
- Create tempdata (tables/stored procedures) that will be used for upgrade
- Check current ODS data for upgrade compatibility, and display action messages to the user if existing data requires changes
- Before modifying the
edfi schema: calculate and store hash values for primary key data that expected NOT to change during upgrade
- Drop views, constraints, stored procs
- Import descriptor data from XML
- Create all new tables for this version that did not previously exist
- Update data in existing tables
- Drop old tables
- Create views/constraints/stored procs for the new version
- Validation check: recalculate the hash codes generated previously, and make sure that all data that is not supposed to change was not mistakenly modified
- Drop all temporary migration data
Minimize the number of scripts with complex dependencies on other scripts in the same directory/upgrade step.
- Compatibility checks are designed to run before any changes to the
edfi schema have been made. This prevents the user from having to deal with a half-upgraded database while making updates
- Initial hash codes used for data validation also must be generated before touching the
edfi schema to ensure accuracy.
- It is also better for performance to do this step while all of our indexes are still present
- Dropping of constraints, views, etc is taken care of before making any schema changes to prevent unexpected sql exceptions
- New descriptors are imported as an initial step before making changes to the core tables. This ensures that all new descriptor data is available in advance for reference during updates
- After creation of descriptors, the sequence of the next steps is designed to ensure that all data sources exist unmodified on the old schema where we expect it to.
- Create tables that are brand new to the schema only (and populate them with existing data)
- Modify existing tables (add/drop columns, etc)
- Drop old tables no longer needed
- Foreign keys, constraints, etc are all added back in once the new table structure is fully in place.
- Once the
edfi schema is fully upgraded and will receive no further changes, we can perform the final data validation check.
v3 upgrade is a good example case to demonstrate the upgrade steps working together due its larger scale:
v3: All foreign keys and other constraints were dropped during this upgrade in order to adopt the new naming conventions
- Also for
v3: ODS types were replaced with new descriptors. This change impacted nearly every table on the existing schema
One script per table in each directory, where possible
Scripts are named in the format:
#### TableName [optional_tags].sql
This convention does not apply to operations that are performed dynamically
Troubleshooting, Timeout prevention
Custom, unknown extensions on the ODS are common. As part of the process of upgrading a highly-customized ODS, an installer is likely to run into a sql exception somewhere in the middle of upgrade (usually caused by a foreign key dependency, schema bound view, etc).
In general, we do not want to attempt to modify an unknown/custom extension on an installer's behalf to try and prevent this from happening. It is important that a installer be aware of each and every change applied to their custom tables. Migration of custom extensions will be handled by the installer.
Considering the above, in the event an exception does occur during upgrade, we want to make the troubleshooting process as easy as possible. If an exception is thrown, an installer should immediately be able to tell:
- Which table was being upgraded when it occurred (from the file name)
- What were the major changes being applied (from the file tags)
- What went wrong (from the exception message)
- Where to find the code that caused it
Many issues may be fixable from the above information alone. If more detail is needed, the installer can view the code in the referenced script file. By separating script changes by table, we make an effort to ensure that there are only a few lines to look though (rather than hundreds)
In addition, each script will be executed in a separate transaction. Operations such as index creation can take a long time on some tables with a large ODS. Splitting the code into separate transactions helps prevent unexpected timeout events
The major downside of this approach is the large number of files it can produce. For example, the
v3 upgrade was a case where all existing tables saw modifications. This convention generates a change script for every table in more than one directory.
With updates becoming more frequent in the future, future versions should not be impacted as heavily.
Most change logic is held in sql scripts (as of V3)
v3: Most of the upgrade logic is performed from the SQL scripts, rather than using .NET based upgrade utility to write database changes directly
v3, most upgrade tasks are simple enough where they can be executed straight from SQL (given a few stored procedures to get started).
Given this advantage, effort was made to ensure that each part of the migration tool (console utility, library, integration tests) could be replaced individually as needed
The current upgrade utility contains a library making use of DbUp to drive the upgrade process. In the future, if/when this tool no longer suits our needs, we should be able to take existing scripting and port it over to an alternative upgrade tool (such as RoundhousE), or even a custom built tool if the need ever arises.
This convention could (and should) change in the future if upgrade requirements become too complex to execute from SQL scripting alone.
Two types of data validation/testing options
- Dynamic, SQL based
- Ensures data that is expected to remain the same does not change
- Can run on any ODS in the field even if the data is unknown
- Integration tests
- Runs on a known, given set of inputs
- Used to test logic in areas where changes should occur
Prevent data loss
The first type of validation, (dynamic, sql based) is executed on on data that we know should not ever change during the upgrade.
- The source and destination tables do not need to be the same. This validation type is most commonly used to verify that data was correctly moved to the expected destination table during upgrade
- Can be executed on any field ODS to ensure that unknown datasets do not cause unexpected data loss during upgrade
- The data in these tables does not need to be known
The second type of data validation, integration test based, is used to test the logic and transformations where we know the data should change:
- For example, during the
v3 upgrade, descriptor namespaces are converted from the
2.x "http://EdOrg/Descriptor/Name.xml" format to the
uri://EdOrg/Name" format. Integration tests are created to ensure that the upgrade logic is functioning correctly for several known inputs
Together, the two validation types (validation of data that changes, and validation of data that does not change) can be used to create test coverage wherever it is needed for a given ODS upgrade.
The the dynamic validation is performed via reusable stored procedures that are already created and available during upgrade.
See scripts in the "*
Source Validation Check" and "
*Destination Validation Check" directories for example usages.