Fully Realize the Value Of Your Datasets With Julia & Azure SQL DW.

Get the Code

Connect to Azure SQL DW with Julia ODBC driver and docker containers

For those who don’t know the great Julia programing language, Julia is a high-level, high-performance dynamic programming language for numerical computing. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. Julia’s Base library, largely written in Julia itself, also integrates mature, best-of-breed open source C and Fortran libraries for linear algebra, random number generation, signal processing, and string processing; simply put, Julia is a fantastic programing language to do advanced data analysis.

That is why Azure and its advanced data services like Azure SQL DW or Azure SQL DB are the perfect companions for Julia programs, using Azure SQL DW from Julia and its ODBC driver is a very easy process, in this article I will show you how you can run Julia program in a docker container and connect to Azure SQL DW with a one command experience.

Azure SQL Data Warehouse is a massively parallel processing (MPP) cloud-based, scale-out, relational database capable of processing massive volumes of data, by using Azure SQL DW with Julia will get the following benefits :
Fully elastic DW : SQL Data Warehouse provides the ability to pause when you are not using it, which stops the billing of compute resources. Another key feature is the ability to scale resources. Pausing and Scaling can be done via the Azure portal or through PowerShell commands
Polybase allows you to leverage your data from different sources by using familiar T-SQL commands. Polybase enables you to query non-relational data held in Azure Blob storage as though it is a regular table.
Accelerate your queries : Azure SQL DW supports Columnstore index , the columnstore index can speed up the performance of analytics queries, they offer an order of magnitude better performance than a btree index while significantly reducing the storage footprint, the data compression achieved depends on the schema and the data, but we see around 10x data compression on average when compared to rowstore with no compression.
• Advanced threat detection , auditing , Transparent data encryption.

Step 1 : Create an Azure SQL DW

Before you start creating services on Azure, you need an Azure subscription, if you don’t have one, you can get a free trial here.
You then can manually create an Azure SQL DW using the Azure Portal or a script and the Azure CLI 2.0, as you can see the Azure CLI is based on python, and you can even run it in a docker container with the following command : docker run azuresdk/azure-cli-python.
Before executing the script, you just have to specify what parameters you want to use :
# Create Resource Group and Datawarehouse


Then the script will create the Azure objects:

# Create Resource Group
az group create --location $LOCATION --name $RESOURCE_GROUP_NAME
# Create Azure SQL Server
az sql server create --administrator-login-password $ADMIN_PASSWORD
--administrator-login $ADMIN_USER
--location $LOCATION
--resource-group $RESOURCE_GROUP_NAME

# Get Public IP
MY_IP=$(curl ipinfo.io/ip)

Create Firewall rule so you can connect from your IP
az sql server firewall create --end-ip-address $MY_IP
--start-ip-address $MY_IP
--resource-group $RESOURCE_GROUP_NAME
--name "OFFICE01"

# Create Azure SQL DW , this will create a DW with 100 DWU
az sql dw create --name $RESOURCE_GROUP_NAME
--resource-group $RESOURCE_GROUP_NAME
# [--collation COLLATION]
# [--max-size MAX_SIZE]
# [--service-objective SERVICE_OBJECTIVE]

That is it, with these basic script you have a fully elastic PETABYTE scale data warehouse in Azure! Migrating datasets to Azure SQL DW is also an easy process, you have all the info Migrate Your Data to Azure SQL DW, for the purpose of this test we will use a basic DB and a very small dataset that we will generate on the fly.

Step 2 : Run Julia and the Julia ODBC on a docker container.

I have included the Docker file in this repo so you can understand what it is required, you can also use the built gonzaloruiz/azuresqljulia provided in the Docker Hub.

Building the Docker container is relatively easy , the first part is fully based on the official Julia Docker container , with a minor caveat, in my case I use Ubuntu as a base as I am more familiar with it.
After that, we just need to install the official SQL Server ODBC driver in the Docker container, my container is based on the official SQL Server for Linux docker container; although this container also installs SQL Server , which we don’t need for our scenario, I just extracted how to install the ODBC SQL Driver for Linux and omitted the back-end part.
The only tricky part of this process was to learn how to properly configure locales in a docker container, I solved this with the following lines :

# install necessary locales for ODBC
RUN apt-get install -y locales
&& echo "en_US.UTF-8 UTF-8" > /etc/locale.gen
&& locale-gen

Step 3 : Install the ODBC and additional Julia packages on the docker container.

I added the Julia packages that I needed in the docker container, so running it is faster because we don’t have to download them when we start the container :

# install Julia packages
RUN julia -e 'Pkg.add("ODBC");Pkg.add("ArgParse")'

Step 4 : Run it, and Love it !

After this, you have a julia docker container ready to connect to Azure SQL databases, you would just need to build the connection string and start the docker container :

SQL_CONNECTRION_STRING=$( printf "Driver={ODBC Driver 13 for SQL Server};Server=tcp:%s.database.windows.net,1433;Database=%s;Uid=%s@%s;Pwd=%s;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;"

# Now we are ready to use the provided Julia & ODBC docker container
docker run -it -v $(pwd):/usr/azuresqljulia -e SQL_CONNECTION_STRING="${SQL_CONNECTRION_STRING}" gonzaloruiz/azuresqljulia

This last command will start the docker container and pass the connection string as environment variable, last step, from the container you just need to run the provided demo Julia script :

julia /usr/azuresqljulia/azureodbc.jl "${SQL_CONNECTION_STRING}"

That is it!! You are ready to do amazing things with large datasets in an efficient and secure manner :)

The Julia script is very simply and available in the code, so I will skip the explanation :)
Would love to hear your feedback and any scenario you might be interested on!