Editors for Python Programming

Several guys asked me which editor to use for Python programming, which one I use and why, which one is better and why, blah…blah…

I say, there are many! Believe me!

I recommend only two editors: 1.) Visual Studio Code, and 2.) Python’s IDLE Editor (Integrated Development & Learning Environment).

I use the Visual Studio Code as I have been working with Visual Studio since last 17 years. And, it provides me environment to code with other languages/scripts, etc. as well. It’s really fast, having lots of extensions, and more extensions coming up. The most important thing – It’s reliable as Microsoft product. And on top of that, it’s FREE, OPEN SOURCE and RUNS EVERYWHERE! 🙂 I have no reason why I should opt for any other editor when I am getting everything, more than that, using Visual Studio Code. Hence, my first preference and recommendation is Visual Studio Code for Python. Here is the link to download Visual Studio Code

Yes! If you’re not a Microsoft guy, then you can just use Python IDLE and learn the language, suffice for beginners. Once done, just install Visual Studio Code and use it.

Hope it gives you a direction to kick off your Python learning! Happy learning!



Connecting Oracle and Teradata Using SSIS

SSIS offers various options in order to connect to Oracle, but of course, not Teradata. Though, we have third party connectors to connect to Oracle as well as Teradata. And, the Microsoft officially recommends these connectors. The first choice being Microsoft Connectors for Oracle and Teradata by Attunity.

If you use OLE DB Source/Destination, you get Microsoft OLE DB Provider for Oracle connector. While if you use ADO NET Source/Destination, you get OracleClient Data Providers connector. Both of these options work fine, however the ETL process becomes slower, rather I would say VERY SLOW, if volume of data is huge. On top of this, if you are using 32-bit machine version, then it hurts more.

Hence, if you wish your ETL process to work faster, I recommend using Microsoft Connectors for Oracle and Teradata by Attunity. These connectors provide 60-70% performance upgrade, and THAT’S PROVEN. You can download these connectors from Microsoft site for free.  Click here to download it. Though, these are also available on Attunitty.com, but, of course, at a price. You will get connectors for SQL Server 2016 and SQL Server 2017 as shown in below image:

Having installed these connectors, restart your Visual Studio, and they will appear in SSIS Toolbox under Common category as shown in below image:

Hope this blog helps you out!

Connecting SAP HANA Using SSIS

In order to connect to SAP HANA, all we need to do is to install SAP HANA Client tools, download from SAP web site. Unfortunately, the Microsoft has officially not yet released any connector to help us out.

The SAP HANA Client tools are available in both 32-bit and 64-bit versions. What all we did is installed 64-bit of SAP HANA Client tools, but to our surprise, it didn’t work. We always received an error message: “The type initializer for Sap.Data.Hana.HanaConnection threw an exception.” despite providing correct credentials to log in. And, we had no idea what’s wrong. Then, just a thought to install 32-bit SAP HANA Client tools did the magic. And, we’re able to connect using whatever manner you wish i.e. by creating DSN on system, using Server Explorer in Visual Studio, using ADO.NET Source in SSIS, etc.

We recommend you to install 32-bit SAP HANA Client tools in Development environment. It may be possible that 64-bit version might work on Production environment. However, we’re yet to test that possibility. For the time being, good to go.

Hope this helps you out!






Blog space for Artificial Intelligence and Machine Learning

I’m going to start blogging on Artificial Intelligence, Machine Learning and Python soon…Watch my blog space 🙂

Different Types/Data Stores in Linked Services in Azure Data Factory

I am sure you know what a Linked Services in Azure Data Factory. If not, search my blog and read about it. I have already blogged about it earlier. Here, I’m going to list down all types/data stores that a Linked Service can connect to. This list will keep on changing in future as Azure team may remove/add other data stores.

Presently, there are 25 Types/Data Stores available for Linked Services to connect to. Here goes the list:

  1. Azure Storage:- [“type”: “AzureStorage”]
  2. Azure Storage SAS:- [“type”: “AzureStorageSas”]
  3. Azure Data Lake Store:- [“type”: “AzureDataLakeStore”]
  4. Azure SQL Database:- [“type”: “AzureSqlDatabase”]
  5. Azure SQL Data Warehouse:- [“type”: “AzureSqlDW”]
  6. Azure DocumentDB:- [“type”: “DocumentDb”]
  7. SQL Server (On-Premises):- [“type”: “OnPremisesSqlServer”]
  8. Cassandra:- [“type”: “OnPremisesCassandra”]
  9. MongoDB:- [“type”: “OnPremisesMongoDb”]
  10. Oracle:- [“type”: “OnPremisesOracle”]
  11. File System:- [“type”: “OnPremisesFileServer”]
  12. DB2:- [“type”: “OnPremisesDb2”]
  13. MySQL:- [“type”: “OnPremisesMySql”]
  14. Teradata:- [“type”: “OnPremisesTeradata”]
  15. PostgreSQL:- [“type”: “OnPremisesPostgreSql”]
  16. Sybase:- [“type”: “OnPremisesSybase”]
  17. HDFS:- [“type”: “Hdfs”]
  18. ODBC:- [“type”: “OnPremisesOdbc”]
  19. OData:- [“type”: “OData”]
  20. Web:- [“type”: “Web”]
  21. SalesForce:- [“type”: “Salesforce”]
  22. AmazonRedshift:- [“type”: “AmazonRedshift”]
  23. Amazon Web Services:- [“type”: “AwsAccessKey”]
  24. FtpServer:- [“type”: “FtpServer”]
  25. Azure Search:- [“type”: “AzureSearch”]

The “type” is mentioned in the JSON script and then we need to write “typeProperties” as per the type. For example, if type is AzureStorage, then we need to mention ConnectionString in the typeProperties section, if Web, then we need to mention Authentication Type and URL, etc.

I will be discussing each and every Data Store and its a few important properties in separate blogs. So, stay tuned!

What is Linked Services in Azure Data Factory

Linked Services in Azure Data Factory play the role of Connection String to external resources.

As we know that Data Factory is nothing but a kind of data integration cloud-service. In other words, it’s nothing but a cloud-based ETL service. Every ETL tool connects to data sources and destinations and processes the data. Hence, we require the connection details like Server Name or Drive Name/Folder, Blog Containers, Database Name, User Credentials, etc. We create a Linked Service in Data Factory to store this information, which can be used by the Activities in the pipelines. It helps use to connect to external resources like Azure SQL, on-prem SQL Server or Oracle, Azure HDInsight, Azure Storage, etc. We can create more than one Linked Services in the same Data Factory.

If you’re a SQL Server Integration Services (SSIS) developer, treat this as Connection String in Connection Manager. That’s it!

Developer usually add the word “LinkedServices” as suffix to the name of the Linked Services, though I prefer to add the word “LS”, which helps me out to have smaller names for these objects. Linked Services information is maintained as JSON in a physical file, which is also known as JSON Script.

Hope it helps you out to get an idea about the Linked Services.

Happy reading!

What is Azure Data Factory

In simple words, Azure Data Factory is nothing but an Extract, Transform and Load (ETL) tool, which does more than what an ETL tool can usually do and has been designed for the Cloud technology. You may say it’s a data integration service in the cloud. It’s just two and half years old kid of the Azure team, which is getting mature with the passage of time.

As per the MSDN documentation, Azure Data Factory is a cloud-based data integration service, to compose data storage, movement, and processing services into automated data pipelines.

From a BI developer perspective, it just a Cloud-based ETL tool. You can design connections (Linked Services), data store schema (DataSet/Table)  and an Activity, which help you to connect to a data source, pull the data out, transform it and then dump into a data destination. The beauty of this tool/service is that it can connect to diverse data stores i.e. relational to non-relational, big data to analytical data, which is very difficult to process using existing ETL tools such as SSIS or Informatica. And, the bonus is we can utilize Azure services i.e. Hardware/Software to improve the performance of the pipelines.

I have just given you an brief overview of Azure Data Factory. Click here if you wish to know more about it.

Happy reading!