Data security consideration is the top priority when migrating to Cloud environment, especially when data transformation happened in public internet pipeline. Technically, when moving into Azure from on-premises, no matter data flow or files, there are several ways can help to protect the data transformation based on what ways we are using.
- Data Connection Encryption.
Azure DB or Azure SQL DW will generate Connection strings (ADO.NET, JDBC, ODBC and PHP) that can be used in client applications. Be aware of that the "Encrypt = True" will be always added (in ADO.NET, JDBC only). This configuration should be always kept when using this connection string in client applications.
- SSIS (SQL Server Integration) Data Flow Encryption.
Believe that most of the MS SQL Server guys will be considering the technical tips when using SSIS as the major tool to do the data transformation. when creating the connection to Azure DW, Azure SQL Server or Azure NoSQL Database by using OLE DB, please ensure to change the 'Use Encryption for Data' to True, this will use SSL in HTTP connection:
If user ADO.NET provider to connect to Azure DW, there is more controls should be aware of:
Column Encryption Setting: This feature is now only being supported in SQL Server 2016 (Not in Azure DW). If the target is Azure DW, this value should be disabled. This property is being used to setup the encryption for particular column.
Encrypt: When True, SQL Server uses SSL encryption for all data sent between the client and server if the server has a certificate installed. In Azure
- Data Factory Data Encryption
Data Factory is Azure-based data movement service, it can not only move data from on-Premises to Azure but also move data inside Azure environment as well..
Our first practice will be loading a local SQL server database (on-premises) into Azure SQL DW. There are several things we need to think about the migrating.
- Technical approach. There are several ways to move data: SQL Server Integration Services, Data Factory, Data Warehouse Migration utility and Azure commands...
- Historical Loading and Incremental Loading. Consider the different way when loading incremental data.
- Security configuration. How to move the user/group accounts into Azure DW
- SQL Code migration. Even though this is SQL to SQL, there is still some problems need to be addressed. For example, limitation in Azure DW
No matter which approach we will be using, there are four major step we have to follow: Move Schema, Move Data and Move Code.
1) Move Schema. Move schema means we have to create the tables' schema before loading the data. Typically in SQL Server, you can generate tables' script from SSMS, or by using other 3rd party tools. Keep in mind that there are not all SQL Server objects will be needed in Azure DW, for example 'primary key'. In the generated scripts, please add the following statement to follow the 'Create Table' :
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
This will create the Distribution column (will discuss later) and Columnstore index.
2) Move Data. This is the core step. Moving data from on-premises SQL to Azure DW has two kinds of 'Moving': Historical Loading and Incremental Loading. In Data Ware housing methodology, the Historical Loading means one time loading or fresh loading, it is being used to load the entire on-premises data into Azure DW. In most of articles, it introduces the Historical Loading. Incremental Loading means we only load new records into Azure DW, this process should be happened based on scheduled frequencies. Both Incremental Loading and Historical Loading can use same technique, but may be not. I will cover this discussion later.
Move data by using SQL Server Integration Services
Probably this is easiest way to do both Historical Loading and Incremental Loading.
I am using this series to describe the topics that related with migrating into Azure SQL Data Warehouse - MS MPP (Massively Parallel Processing) Data platform in Azure. In fact the name is confused for most of people who are familiar with SQL on-premises, in fact, Azure SQL Data Warehouse is a Azure SQL Database which supports MPP by using distributed Compute Nodes, so it created has to be under Azure SQL Database services.
I am trying to using multiple different DW systems to evaluate the 'Migrating': SQL Server, Oracle, DB2, RDS etc. Also I will use this series to compare pros and cons between different DWs in technical perspective (I hope I can.:)).
Setup and initial the Azure SQL Data Warehouse is pretty simple, much easier than installing a SQL Server instance in the local server :). Several Steps:
- Create SQL DW (Specify or create new SQL Server)
- Specify Performance Tire (DWU unit, 100 DWU initially)
- Specify administrator account
The following screen shot shows the server name just created:
Very basic request at this point is how can we connect and operate this Azure DW? Fortunately we have a very simple and easy way do that by using SQL Server Management Studio 2016. In the screen shot below, it shows we are trying to connect to the Azure SQL Server instead of Azure SQL DW as Azure SQL DW is a DATABASE.
When connecting the Azure SQL, the system requests to login Azure from client computer so that Azure SQL firewall can allow the computer's IP address to connect to Azure SQL. The screen shot below shows the result when connected to Azure SQL, and the first Azure SQL DW: bridatadw1