Archive for February, 2017

Different Types/Data Stores in Linked Services in Azure Data Factory

I am sure you know what a Linked Services in Azure Data Factory. If not, search my blog and read about it. I have already blogged about it earlier. Here, I’m going to list down all types/data stores that a Linked Service can connect to. This list will keep on changing in future as Azure team may remove/add other data stores.

Presently, there are 25 Types/Data Stores available for Linked Services to connect to. Here goes the list:

  1. Azure Storage:- [“type”: “AzureStorage”]
  2. Azure Storage SAS:- [“type”: “AzureStorageSas”]
  3. Azure Data Lake Store:- [“type”: “AzureDataLakeStore”]
  4. Azure SQL Database:- [“type”: “AzureSqlDatabase”]
  5. Azure SQL Data Warehouse:- [“type”: “AzureSqlDW”]
  6. Azure DocumentDB:- [“type”: “DocumentDb”]
  7. SQL Server (On-Premises):- [“type”: “OnPremisesSqlServer”]
  8. Cassandra:- [“type”: “OnPremisesCassandra”]
  9. MongoDB:- [“type”: “OnPremisesMongoDb”]
  10. Oracle:- [“type”: “OnPremisesOracle”]
  11. File System:- [“type”: “OnPremisesFileServer”]
  12. DB2:- [“type”: “OnPremisesDb2”]
  13. MySQL:- [“type”: “OnPremisesMySql”]
  14. Teradata:- [“type”: “OnPremisesTeradata”]
  15. PostgreSQL:- [“type”: “OnPremisesPostgreSql”]
  16. Sybase:- [“type”: “OnPremisesSybase”]
  17. HDFS:- [“type”: “Hdfs”]
  18. ODBC:- [“type”: “OnPremisesOdbc”]
  19. OData:- [“type”: “OData”]
  20. Web:- [“type”: “Web”]
  21. SalesForce:- [“type”: “Salesforce”]
  22. AmazonRedshift:- [“type”: “AmazonRedshift”]
  23. Amazon Web Services:- [“type”: “AwsAccessKey”]
  24. FtpServer:- [“type”: “FtpServer”]
  25. Azure Search:- [“type”: “AzureSearch”]

The “type” is mentioned in the JSON script and then we need to write “typeProperties” as per the type. For example, if type is AzureStorage, then we need to mention ConnectionString in the typeProperties section, if Web, then we need to mention Authentication Type and URL, etc.

I will be discussing each and every Data Store and its a few important properties in separate blogs. So, stay tuned!

What is Linked Services in Azure Data Factory

Linked Services in Azure Data Factory play the role of Connection String to external resources.

As we know that Data Factory is nothing but a kind of data integration cloud-service. In other words, it’s nothing but a cloud-based ETL service. Every ETL tool connects to data sources and destinations and processes the data. Hence, we require the connection details like Server Name or Drive Name/Folder, Blog Containers, Database Name, User Credentials, etc. We create a Linked Service in Data Factory to store this information, which can be used by the Activities in the pipelines. It helps use to connect to external resources like Azure SQL, on-prem SQL Server or Oracle, Azure HDInsight, Azure Storage, etc. We can create more than one Linked Services in the same Data Factory.

If you’re a SQL Server Integration Services (SSIS) developer, treat this as Connection String in Connection Manager. That’s it!

Developer usually add the word “LinkedServices” as suffix to the name of the Linked Services, though I prefer to add the word “LS”, which helps me out to have smaller names for these objects. Linked Services information is maintained as JSON in a physical file, which is also known as JSON Script.

Hope it helps you out to get an idea about the Linked Services.

Happy reading!

What is Azure Data Factory

In simple words, Azure Data Factory is nothing but an Extract, Transform and Load (ETL) tool, which does more than what an ETL tool can usually do and has been designed for the Cloud technology. You may say it’s a data integration service in the cloud. It’s just two and half years old kid of the Azure team, which is getting mature with the passage of time.

As per the MSDN documentation, Azure Data Factory is a cloud-based data integration service, to compose data storage, movement, and processing services into automated data pipelines.

From a BI developer perspective, it just a Cloud-based ETL tool. You can design connections (Linked Services), data store schema (DataSet/Table)  and an Activity, which help you to connect to a data source, pull the data out, transform it and then dump into a data destination. The beauty of this tool/service is that it can connect to diverse data stores i.e. relational to non-relational, big data to analytical data, which is very difficult to process using existing ETL tools such as SSIS or Informatica. And, the bonus is we can utilize Azure services i.e. Hardware/Software to improve the performance of the pipelines.

I have just given you an brief overview of Azure Data Factory. Click here if you wish to know more about it.

Happy reading!