Sync Data from Azure SQL Database to Azure Database for PostgreSQL with Azure Pipelines & Python

Kunal Aich
Python in Plain English
5 min readJan 17, 2021

--

On one hand, Azure SQL Database is the intelligent, scalable, relational database-as-a-service (DBaaS) built for the Azure cloud by Microsoft. It is evergreen and always up to date, with AI-powered and automated features that optimize performance and durability. On the other hand, Azure Database for PostgreSQL is the fully managed, intelligent and scalable relational database for PostgreSQL by Microsoft Azure. It integrates with valuable Postgres features including JSONB, geo-spatial support, rich indexing and many more.

Many a times, one may need to synchronize data (also schedule) between two different databases for different computational purposes. Below are the steps to sync data (and schedule) from Azure SQL Database to Azure Database for PostgreSQL with Azure Pipelines and Python script:

1. Create a Python script to connect to both the databases and write the raw queries to SELECT from Azure SQL Database and then UPSERT to Azure Database for PostgresSQL

Create a Python file as sync.py and write the lines of code as below. Change the values within the <> brackets as per the need.

import pyodbc
import psycopg2
import sys
class DataSync:def __init__(self):
self.conn1 = pyodbc.connect(driver=‘{ODBC Driver 17 for SQL Server}’,
server=sys.argv[1],
database=sys.argv[2], uid=sys.argv[3],
pwd=sys.argv[4])
self.conn2 = psycopg2.connect(database=sys.argv[5], host=sys.argv[6], user=sys.argv[7], password=sys.argv[8])
self.cur1 = self.conn1.cursor()
self.cur2 = self.conn2.cursor()
def select_from_sql_db(self):
self.cur1.execute(‘‘‘SELECT <columns>
FROM <table>
WHERE <conditions> ’’’)
rows = self.cur1.fetchall()
print(‘Total Records to be synced:’, len(rows))
return rows
def upsert_to_az_postgres(self, data):
count_insert = 0
count_update = 0
for row in data:
query = ‘‘‘ SELECT count(*) FROM <table>
WHERE <conditions> ’’’
self.cur2.execute(query % (<row indices>))
count = self.cur2.fetchone()[0]
if count == 0:
query = ‘‘‘
INSERT INTO <table>
(<columns>)
VALUES (<values as %s>) ’’’
self.cur2.execute(query, row)
count_insert = count_insert + 1
else:
query = ‘‘‘
UPDATE <table>
SET <columns each as = ‘%s’>
WHERE <conditions>
’’’
self.cur2.execute(
query % (<row indices>))
count_update = count_update + 1
self.conn1.commit()
self.conn2.commit()
print(‘Number of records inserted:’, count_insert)
print(‘Number of records updated:’, count_update)
obj = DataSync()
obj.upsert_to_az_postgres(obj.select_from_sql_db())

2. Create a new Azure Pipeline for Release in Azure DevOps Services.

In the Pipeline section, add a new Artifact to select the python script file created in first step. Here I pushed the file in a Azure Repos git repository.

Add an artifact to the pipeline

In the Stages, add a new stage (call it Sync) and the add the job and the tasks needed (given in below steps).

Stages of the pipeline

In the Tasks section, create a deployment process and add an agent job with Agent pool as ‘Azure Pipelines’ and Agent Specification as ‘ubuntu-16.04’ or any version as preferable.

Agent job in the pipeline

3. Create the tasks in the Azure Pipeline for Release

Add a Use Python Version task and select Version spec as 3.7 or any preferred version. Check the ‘Add to PATH’ checkbox to prepend the retrieved Python version to the PATH environment variable to make it available in subsequent tasks or scripts without using the output variable.

Use Python Version task

Below is the YAML version of the above task:

Steps:

- task: UsePythonVersion@0
displayName: ‘Use Python 3.7’
inputs:
versionSpec: 3.7

Next create the second task as a Bash script to install Microsoft ODBC Driver 17. Select the Task version as 3.* and write the below commands in the inline Script section:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install msodbcsql17
Bash task

Below is the YAML version of the above bash task:

Steps:

- bash: |
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install msodbcsql17
displayName: ‘Install Microsoft ODBC Driver 17’

Then add a Command line task to install pip packages. Select the Task version as 2.* and add the below lines in the Script section:

pip install pyodbc
pip install psycopg2==2.8.6

In the working directory, select the working directory where the python script resides.

Command line task

Below is the YAML version of the command line task:

Steps:

- script: |
pip install pyodbc
pip install psycopg2==2.8.6
workingDirectory: ‘$(System.DefaultWorkingDirectory)/script’
displayName: ‘Pip Installs Packages’

Lastly, add the Python script task to run the python file. Select Task version as 0.*, Script source as ‘File path’ and the script path in the Script path section. In the Arguments section, add the arguments (if any) passed to the script execution, available through sys.argv.

Python script task

Below is the YAML version of the Python script task (removed the arguments section as it holds sensitive data):

Steps:

- task: PythonScript@0
displayName: ‘Run Python Script’
inputs:
scriptPath: ‘$(System.DefaultWorkingDirectory)/script/sync.py’
All the tasks on the Agent job of the Deployment process

4. Schedule the Azure Pipeline for Release trigger

The good thing about Azure Pipelines is that it can be scheduled to trigger releases. To schedule a release trigger, select the ‘Schedule not set’ icon on the Artifacts box of the Pipeline section. It will open a right drawer.

Disabled Scheduled release trigger

In the ‘Scheduled release trigger’ drawer, make the toggle to Enabled. Click on the ‘+ Add a new time’ button to add the days of the week and time (hour & minute) with timezone to set the scheduled trigger. The same can be enabled in the ‘Pre-deployment conditions’ of the created pipeline stage. Then click on ‘Save’ on top right of the pipeline to save the changes. Now the sync job will trigger on the scheduled time as declared.

Enabled Scheduled release trigger

The pipeline can be tested or manually triggered using the ‘Create release’ button on the top-right corner of the page.

Create release option

Hope this helps. Happy coding!

--

--

A Software Engineer by profession, having 7+ years of experience in the industry, works with Python, Django, PostgreSQL and Microsoft Azure.