Information Systems: Simplify Your Data Synchronization Process with Dapper and Bulk Insert in ASP.NET

Introduction:

Data synchronization is the process of maintaining consistent and accurate data across multiple systems or devices. In modern software systems, data synchronization has become increasingly important due to the proliferation of data and the need to ensure that information is accurate and up-to-date. In today's world, businesses and organizations rely heavily on data to make informed decisions, and any inaccuracies or inconsistencies in that data can lead to serious consequences.

Code Example

Importance of Data Synchronization:

Data synchronization is critical in modern software systems for several reasons:

Data Consistency: In any system where data is used across multiple devices or platforms, ensuring data consistency is essential. Data synchronization ensures that any changes made to the data in one system are automatically reflected in all other systems, ensuring that all users have access to the most up-to-date and accurate information.
Improved Productivity: When data is synchronized across all systems, it reduces the need for manual data entry, which can be time-consuming and prone to errors. This can improve overall productivity by freeing up time for employees to focus on other tasks.
Enhanced Collaboration: Data synchronization allows for improved collaboration between team members, regardless of their location. With synchronized data, team members can work together in real-time, sharing information and making decisions based on the most up-to-date information available.
Disaster Recovery: In the event of a system failure or data loss, data synchronization can be used to quickly restore data from a backup, reducing the impact of any downtime and minimizing the risk of data loss.

In conclusion, data synchronization is a crucial component of modern software systems. It ensures data consistency, improves productivity, enhances collaboration, and enables efficient disaster recovery. As businesses and organizations continue to rely on data to make informed decisions, data synchronization will become increasingly important in ensuring that that data is accurate, up-to-date, and readily available.

Data synchronization is the process of ensuring that data is consistent and up-to-date across multiple systems or databases. In many software applications, data is stored in multiple databases, and it is essential to ensure that changes made to the data in one database are propagated to all the other databases in a timely and accurate manner. This is where the dataSync service comes in handy.

The dataSync service uses a combination of SQL queries and bulk copy operations to synchronize data between the two databases. It first retrieves the data from the source database using a SELECT statement and applies any filters and sorting rules specified by the user. The service then compares the data with the destination database to identify any changes or additions. It filters out any data that already exists in the destination database and applies any updates or additions to the destination database using bulk copy operations.

Overall, the dataSync service offers a reliable and efficient way to synchronize data between two databases in limited scenarios. It helps to ensure that data is consistent and up-to-date across multiple systems, which is essential for modern software applications.

Code Example
SyncTable method:

The method is declared with several input parameters that define the source and destination databases, table, and other settings.


public async Task SyncTable(string connectionString, string destinationConnectionString, string tableName, string schema, Func<dynamic, bool> filterExpression = null, List<string> columns = null, bool bulkInsert = true , string whereClause = null, string orderByClause = null, int? top = null)

Inside the method, two SqlConnection objects are created for the source and destination databases.


using SqlConnection sourceConnection = new SqlConnection(connectionString);
using SqlConnection destinationConnection = new SqlConnection(destinationConnectionString);

The next step is to get the matched columns from the source and destination databases, as well as the primary key columns for the destination table.


string[] sourceColumns = await GetMatchedColumns(sourceConnection, null, tableName, schema) ?? Array.Empty<string>();
string[] destinationColumns = await GetMatchedColumns(destinationConnection, sourceColumns, tableName, schema) ?? Array.Empty<string>();
IEnumerable<string> primaryKeyColumns = columns ?? await GetPrimaryKeyColumns(destinationConnection, tableName, schema);

A SELECT statement is built using the source table and columns, as well as any additional filter expressions, sorting, or limit settings specified in the input parameters.


var selectCommand = BuildSelectCommand(sourceColumns, tableName, schema , whereClause , orderByClause , top , primaryKeyColumns);

If a filter expression is provided, it is appended to the SELECT statement.


if (filterExpression != null)
{
    selectCommand += $" WHERE {filterExpression.Method.Name}()";
}

The SELECT statement is executed against the source database, and the resulting entities are stored in a List.


List<dynamic> sourceEntities = (await sourceConnection.QueryAsync<dynamic>(selectCommand)).ToList();

If there are no entities returned from the SELECT statement, the method returns without further action.


if (!(sourceEntities?.Any() ?? false))
{
    return;
}

The existing primary key values for the destination table are retrieved and stored in a List.


var existingDestinationIds = await destinationConnection.QueryAsync<dynamic>($"SELECT {string.Join(", ", primaryKeyColumns)} FROM {schema}.{tableName}");
var existingDestinationKeys = existingDestinationIds.ToList();

The source entities are filtered to remove any that already exist in the destination table, based on matching primary key values.


var filteredSourceEntities = sourceEntities.Where(x =>
{
    var xDict = x as IDictionary<string, object>;
    return xDict != null && !existingDestinationKeys.Any(y =>
    {
        var yDict = y as IDictionary<string, object>;
        return yDict != null && primaryKeyColumns.All(k => xDict[k].Equals(yDict[k]));
    });
}).ToList();

If there are any filtered source entities remaining, they are either bulk copied to the destination table using SqlBulkCopy, or inserted one at a time, depending on the input parameter settings.


if (filteredSourceEntities?.Any() ?? false)
{
    if (bulkInsert)
    {
        SqlBulkCopy sqlBulkCopy = await BulkCopyAsync(tableName, schema, destinationConnectionString, destinationConnection, sourceColumns, destinationColumns, filteredSourceEntities);
    }
    else
    {
        await CopyOneByOneAsync(tableName, schema, destinationConnection, destinationColumns, filteredSourceEntities);
    }
}

It's important to note that the code snippet provided is just an example of how data synchronization can be achieved using Dapper and bulk insert. Depending on your specific use case, there may be other ways to achieve data synchronization, such as using SQL's MERGE statement to update or insert records.

There are also other useful tools and references available for data synchronization, such as Microsoft's Sync Framework and third-party tools like SymmetricDS and Flyway. These tools offer additional functionality and can be used to synchronize data across multiple databases, servers, and platforms.

In conclusion, data synchronization is an essential component of modern software systems, and using a service like dataSync can help ensure smooth and error-free data transfer between databases. By understanding the key concepts and best practices of data synchronization, developers can build robust and reliable applications that meet the needs of their users.
Code Example