Diving into Data Analytics Learning Path

If you want to become a data analyst but, don't know how/where to start then, this blog is for you.

January 05, 2022 · 5 mins read
Category: Data Analytics

A question that I often receive from people who want to start their journey into becoming a data analyst is the following:

"What should I learn to become a data analyst"

In this blog post; I am going to answer this million-dollar question 😃

Before starting, let's understand what a typical data analytics process architecture looks like

  • Stage-1: Getting The Data
  • Stage-2: Cleaning The Data
  • Stage-3: Storing the Data
  • Stage-4: Exploratory Data Analysis
  • Stage-5: Data Visualization
There are different tools available to ease out these 5-staged processes of data analytics. Let's discuss them one by one.

Stage-1: Getting the Data

Getting the source data can be as easy as receiving well-formatted & well-documented data from the client or, from a data source to be as complex as scraping the data from a website.

The tools & technology that we must know to seamlessly get the required data from popular/common data sources are as follows:

  • Excel & Google Sheets
  • SQL Querying with SQL Server/PostgreSQL/MySQL
  • Web Scraping with Python/R

There are also other sources (Big Data Sources) where we need to learn NoSQL Querying and other techniques to get the data, however, that's a little advance and we'll cover that in some other blog post.

Stage-2: Cleaning the Data

Cleaning and transforming data is an essential step in the data analytics process as we hardly get the data in the desired format.

Some of the ETL tools that we should learn to perform ETL are as follows:

  • Power Query
Any one of the following ETL Tools:

  • SSIS
  • Alteryx
  • Azure Data Factory (ADF)
  • KNIME
  • Talend
  • Informatica
My recommendation would be Alteryx along with SSIS/ADF.

Any one of the following programming language libraries:

  • Python libraries: Numpy/Pandas
  • R libraries: Tidyverse

Stage-3: Storing the Data

Understanding data storage along with data security and backup is also very important.

Therefore, it is important to have a basic understanding of database management systems, data warehouses, data lake storage, cloud storage systems (Azure/AWS) etc.

Stage-4: Exploratory Data Analysis

Exploratory Data Analysis (EDA) is the heart of any data analysis process. In this process, we can find correlation, distribution, and patterns in our data that will lead us to generate impactful insights.

Tools that can help us in EDA are as follows:

  • Pivot Tables/Pivot Charts
  • Microsoft Power BI / Tableau
Any of the following programming language libraries:

  • Python Libraries: Numpy, Pandas, MatPlotlib, Seaborn, Plotly
  • R Libraries: Tidyverse, GGplot2

Several tools could help in doing the EDA, however, I've mentioned the tools that are much popular and worked for me.

Stage-5: Data Visualization

Visualizing and reporting the data clearly and concisely is the step that will convey all your hard work done in prior stages.

Through data visualization, we represent our insights to the end-user in a visually appealing manner.

Popular tools that can help us in performing data visualization are:

  • Microsoft Power BI
  • Tableau
  • Looker
  • Google Data Studio
  • MicroStrategy
  • QlikView/QlikSense

Conclusion:

There are n-number of tools in the market for us to get the job done but, the tools are the only means to achieve what we want.

We should always focus more on the concepts and the approach to solve the problem and then choose the right tool to solve it.

There is a popular notion that goes in the market that tools are not important and let me tell you that it's not true.

Tools are as important as the concepts because without the proper knowledge of tools you can't achieve what you want.

I'll discuss this "Tools Vs. Concepts" debate in much detail in some other blog post based on my experience.

Till then, keep up-skilling and stay healthy 👍