Data Analytics

From Grundy
Jump to navigation Jump to search

In today's world, data has become a very important entity as fields like Machine Learning, Data Science have emerged. Data is being used to get mind-blowing results. Some of these include predicting the extent to which the population can get affected by an epidemic, predicting the stock market values or maybe the weather in the coming days and months.

Analyzing data is the prime task for making these models work and get us the results and Data Visualization is a very important skill that is required to understand the data that exists and also understand the outcome of these models.

So let us get introduced to our first python library matplotlib which is the first step towards Data Visualization.


Matplotlib is a plotting library in python which is used for Visualizing datasets in different ways. Pyplot is a Matplotlib module which provides a MATLAB-like interface. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python, and the advantage of being free and open-source.

Importing the Dataset

import pandas as pd

dataset= pd.read_csv(location of the dataset)

x=dataset.iloc[:,0:1] #setting first column to x

y=dataset.iloc[:,1:2] #setting second column to y

Importing the library

import matplotlib.pyplot as plt

Plotting the curve


x and y have to be 1-D arrays.

A title to the plot


Labelling the axes

plt.xlabel(x-axis label)

plt.ylabel(y-axis label)

Additional features

The plot that was created above is the most basic line plot and can be used for plotting a single curve. Many at times there is a need to plot more than 1 curve on a single graph to see how the trend varies. In that case, we need to use some additional features to plot multiple curves on the same graph. These include colouring and labeling the curve, adding a legend, etc.

plt.plot(x,y,color=colour eg.'red',label=label1,ls=line style eg."--",ms=marker size,marker=marker style eg.'s'(square))

More of these features can be found in matplotlib gallery.

plt.legend(loc=location of the legend)

Important links:


Commands for Pyplot: