Azure SQL Data Sync is a service built on Azure SQL Database that allows to synchronize data from one or many disparate SQL data sources either on-premises or Azure SQL Databases to a single Azure SQL Database (called as Hub) and vice versa. The data can be synced manually or automatically on a set schedule, anywhere from once every five minutes to once a month.
Data Sync is based around the concept of a Sync Group. A Sync Group is a group of databases that you want to synchronize.
The following properties of Sync Group has to be set to setup the sync:
- Sync Schema: The tables and columns to be sync.
- Sync Direction: Direction can be uni-directional or bi-directional. That is, the Sync Direction can be Hub to Memberor Member to Hub, or both.
- Sync Interval:How often synchronization occurs (anywhere from once every five minutes to once a month).
- Conflict Resolution Policy:Defines who wins in case of a conflict: Hub wins or Member wins.
Data Sync service uses a hub and spoke topology to synchronize data. One of the databases in the group is defined as the Hub Database. The rest of the databases are member databases. Sync always occurs only between the Hub and individual members. Even while synchronizing two on premise databases the sync must be through Hub database.
- Hub Database must be an Azure SQL Database.
- Member databases can be either SQL Databases, on-premises SQL Server databases, or SQL Server instances on Azure virtual machines.
- Sync Database contains the metadata and log for Data Sync. The Sync Database has to be an Azure SQL Database located in the same region as the Hub Database.
How does Azure Data Sync works?
- Tracking data changes:Data Sync tracks changes using insert, update, and delete triggers. The changes are recorded in a side table in the user database.
- Synchronizing data:Data Sync is designed in a Hub and Spoke model. The Hub syncs with each member individually. Changes from the Hub are downloaded to the member and then changes from the member are uploaded to the Hub.
- Resolving conflicts:Data Sync provides two options for conflict resolution, Hub wins or Member wins.
- If you select Hub wins, the changes in the hub always overwrite changes in the member.
- If you select Member wins, the changes in the member overwrite changes in the hub. If there’s more than one member, the final value depends on which member syncs first.
Setting up Sync Service
- Install the Azure SQL Data Sync Client Agent on the machine(s) that have the databases that need to be synced. Provide the username and password to allow agent to register your reference database with Azure, during installation. The account must have “log on as a service” rights in Azure and read/write access to reference database(s).
- Setup a Sync Agent in Azure.
- Get the key from the Sync Agent on Azure and paste into the Sync Agent Client.
- Register reference database(s) through the Sync Agent Client interface.
- In Azure, create a Sync Group that will define what your Hub database and Spoke (Reference) databases will be.
- Setup Sync Rules for the group.
- This is where you choose:
- The schema that the sync service will use as its definition for the sync process. It’s not necessary for Azure SQL database to have schema at this point. The sync job will create schema itself.
- Tables and columns to be synced.
- Filters to use in the sync process (e.g. “where id >= 1000”).
- Finally, manually sync your data or setup the interval for automatic sync.
The Azure SQL Sync team has published several limitations that you need to be aware of before setting up a sync job. Below is listed the most common issues:
- The maximum character length of any database, table, schema, or column is 50 characters.
- There can be no more than 500 tables across all databases included in the sync group.
- There can be no more than 1000 columns in a single table in a sync group.
- The maximum number of endpoints across all sync groups is 30.
- The max data row size on any table in the sync group is 24Mb.
- If more than one database in the sync group already contains data before the initial sync, then every row will be treated as a conflict during the original sync and will require resolution, which on very large data sets could take from hours to months according to Microsoft’s documentation. So, Azure SQL Data Sync is not for really big data.
Microsoft has documented the process pretty thoroughly here.