Sync Data from Azure SQL Database to Azure Database for PostgreSQL with Azure Pipelines & Python
On one hand, Azure SQL Database is the intelligent, scalable, relational database-as-a-service (DBaaS) built for the Azure cloud by Microsoft. It is evergreen and always up to date, with AI-powered and automated features that optimize performance and durability. On the other hand, Azure Database for PostgreSQL is the fully managed, intelligent and scalable relational database for PostgreSQL by Microsoft Azure. It integrates with valuable Postgres features including JSONB, geo-spatial support, rich indexing and many more.
Many a times, one may need to synchronize data (also schedule) between two different databases for different computational purposes. Below are the steps to sync data (and schedule) from Azure SQL Database to Azure Database for PostgreSQL with Azure Pipelines and Python script:
1. Create a Python script to connect to both the databases and write the raw queries to SELECT from Azure SQL Database and then UPSERT to Azure Database for PostgresSQL
Create a Python file as sync.py and write the lines of code as below. Change the values within the <> brackets as per the need.
import pyodbc
import psycopg2
import sysclass DataSync:def __init__(self):
self.conn1 = pyodbc.connect(driver=‘{ODBC Driver 17 for SQL Server}’,
server=sys.argv[1],
database=sys.argv[2], uid=sys.argv[3],
pwd=sys.argv[4])
self.conn2 = psycopg2.connect(database=sys.argv[5], host=sys.argv[6], user=sys.argv[7], password=sys.argv[8])
self.cur1 = self.conn1.cursor()
self.cur2 = self.conn2.cursor()def select_from_sql_db(self):
self.cur1.execute(‘‘‘SELECT <columns>
FROM <table>
WHERE <conditions> ’’’)
rows = self.cur1.fetchall()
print(‘Total Records to be synced:’, len(rows))
return rowsdef upsert_to_az_postgres(self, data):
count_insert = 0
count_update = 0
for row in data:
query = ‘‘‘ SELECT count(*) FROM <table>
WHERE <conditions> ’’’
self.cur2.execute(query % (<row indices>))
count = self.cur2.fetchone()[0]
if count == 0:
query = ‘‘‘
INSERT INTO <table>
(<columns>)
VALUES (<values as %s>) ’’’
self.cur2.execute(query, row)
count_insert = count_insert + 1
else:
query = ‘‘‘
UPDATE <table>
SET <columns each as = ‘%s’>
WHERE <conditions>
’’’
self.cur2.execute(
query % (<row indices>))
count_update = count_update + 1
self.conn1.commit()
self.conn2.commit()
print(‘Number of records inserted:’, count_insert)
print(‘Number of records updated:’, count_update)obj = DataSync()
obj.upsert_to_az_postgres(obj.select_from_sql_db())
2. Create a new Azure Pipeline for Release in Azure DevOps Services.
In the Pipeline section, add a new Artifact to select the python script file created in first step. Here I pushed the file in a Azure Repos git repository.
In the Stages, add a new stage (call it Sync) and the add the job and the tasks needed (given in below steps).
In the Tasks section, create a deployment process and add an agent job with Agent pool as ‘Azure Pipelines’ and Agent Specification as ‘ubuntu-16.04’ or any version as preferable.
3. Create the tasks in the Azure Pipeline for Release
Add a Use Python Version task and select Version spec as 3.7 or any preferred version. Check the ‘Add to PATH’ checkbox to prepend the retrieved Python version to the PATH environment variable to make it available in subsequent tasks or scripts without using the output variable.
Below is the YAML version of the above task:
Steps:
- task: UsePythonVersion@0
displayName: ‘Use Python 3.7’
inputs:
versionSpec: 3.7
Next create the second task as a Bash script to install Microsoft ODBC Driver 17. Select the Task version as 3.* and write the below commands in the inline Script section:
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install msodbcsql17
Below is the YAML version of the above bash task:
Steps:
- bash: |
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install msodbcsql17
displayName: ‘Install Microsoft ODBC Driver 17’
Then add a Command line task to install pip packages. Select the Task version as 2.* and add the below lines in the Script section:
pip install pyodbc
pip install psycopg2==2.8.6
In the working directory, select the working directory where the python script resides.
Below is the YAML version of the command line task:
Steps:
- script: |
pip install pyodbc
pip install psycopg2==2.8.6
workingDirectory: ‘$(System.DefaultWorkingDirectory)/script’
displayName: ‘Pip Installs Packages’
Lastly, add the Python script task to run the python file. Select Task version as 0.*, Script source as ‘File path’ and the script path in the Script path section. In the Arguments section, add the arguments (if any) passed to the script execution, available through sys.argv.
Below is the YAML version of the Python script task (removed the arguments section as it holds sensitive data):
Steps:
- task: PythonScript@0
displayName: ‘Run Python Script’
inputs:
scriptPath: ‘$(System.DefaultWorkingDirectory)/script/sync.py’
4. Schedule the Azure Pipeline for Release trigger
The good thing about Azure Pipelines is that it can be scheduled to trigger releases. To schedule a release trigger, select the ‘Schedule not set’ icon on the Artifacts box of the Pipeline section. It will open a right drawer.
In the ‘Scheduled release trigger’ drawer, make the toggle to Enabled. Click on the ‘+ Add a new time’ button to add the days of the week and time (hour & minute) with timezone to set the scheduled trigger. The same can be enabled in the ‘Pre-deployment conditions’ of the created pipeline stage. Then click on ‘Save’ on top right of the pipeline to save the changes. Now the sync job will trigger on the scheduled time as declared.
The pipeline can be tested or manually triggered using the ‘Create release’ button on the top-right corner of the page.
Hope this helps. Happy coding!