4.1 Pandas in Python

Introduction to Pandas: -

Pandas is the fastest and mostly used library for data analysis and data manipulation.

It is a high-level data manipulation tool developed by Wes McKinney.

It is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.

The major outcomes of the panda are:

Data Analysis.

Data preparation.

Data manipulation.

Data modelling.

Data analysis.

Features of Pandas

It has fast and efficient DataFrame object with default and customized indexing.

It has tools for loading data into in-memory data objects from different file formats.

It is used for data alignment and integrated handling of missing data.

Easy handling of missing data (represented as NaN) in floating point as well as non-floating-point data.

Reshaping and pivoting of date sets.

Label-based slicing, indexing and sub-setting of large data sets.

Columns from a data structure can be deleted or inserted.

Group by data for aggregation and transformations.

High performance merging and joining of data.

Time Series functionality.

Benefits of Pandas: - The main advantages of Pandas are:

It has functions for analyzing, cleaning, exploring, and manipulating data.

It allows us to analyze big data and make conclusions based on statistical theories.

It can clean messy data sets, and make them readable and relevant.

It helps to shorten the procedure of handling data. With the time saved, we can focus more on data analysis algorithms.

Installing Pandas: -

click on the Start button to open the start menu.

Type “cmd,” and the Command Prompt app should appear as a listing in the start menu.

Enter the following command on the terminal.

Py -m pip install pandas

Introduction to Panda Object: - Pandas support two data structures:

1. Series: - one-dimensional labeled arrays.

2. DataFrames: - two-dimensional data structure with columns, much like a table.

1. Series: -

Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).

A Pandas Series object is like a column in a table.

Series are generally created from:

a. Arrays

b. Lists

c. Dict

a) Create a Series from Arrays: -

firstly, we have to import the numpy module and then use array( ) function in the program.

Example: -

Output: -

b) Create a Series from Lists: -

In order to create a series from list, we have to first create a list after that we can create a series from list.

Example: -

output: -

c) Create a Series from dict: -

We can also create a Series from dict.

All the keys in the dictionary will become the indices of the Series object, whereas all the values from the key-value pairs in the dictionary will become the values (data) of the Series object.

Example: -

output: -

2) DataFrame: -

A DataFrame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.

Data frame is an object that is useful in representing data in the form of rows and columns.

Data frames are generally created from:

a. List

b. List of tuples

c. Dictionary

d. Excel spreadsheet files

e. .csv (common separated values) files

a) Create a DataFrame from Lists: -

The DataFrame can be created using single list.

To do this we need to pass a python list as a parameter to the pandas DataFrame( ) function.

DataFrame( ) function is used to create a dataframe in Pandas.

Example: -

Output: -

b) Create a DataFrame from List of Tuples: -

The DataFrame can be created using list of tuples.

A tuple can be treated as a row of data.

Suppose, if we want to store the data of 3 employees, as we have to create 3 tuples.

Example: -

output: -

c) Create a DataFrame from Dictionary: -

It is also possible to create a python dictionary that contains employee data.

A dictionary stores data in the form of key-value pairs.

In this case, we take 'EMPID' and 'ENAME' as keys and corresponding lists as values.

Example: -

Output: -

d) Create a DataFrame from Excel Spreadsheet: -

We can also read an excel file as a DataFrame.

Let us assume excel spread sheet file named "Emp.xlsx".

We have created an excel file which contains data of employee id number, employee name, job and salary. This file is saved with the file name "Emp" with extension "xlsx".

To create a data frames, we need to first import the pandas package.

We also need xlrd package. XLRD package is useful to retrieve data from Excel file.

install xlrd package by using command prompt:

py -m pip install xlrd

py -m pip install openpyxl

To read the data from emp.xlsx file, read_excel( ) function of pandas package will be used.

Example: -

output: -

e) Create a DataFrame from .CSV file: -

CSV stands for "Comma Separated Values."

In many cases, the data will be in the form of .csv files.

It is the simplest form of storing data in tabular form as plain text.

It is similar to Excel file but it takes less memory.

It is important to know to work with .csv because we mostly rely on .csv data in our day-to-day lives as data scientists.

We have created CSV file which contains the same data as created during excel file i.e. data of employee id number, employee name, job, and salary of a company. This file is saved with the file name "Emp" with extension "csv".

To read the data from empdata.csv file, read_csv( ) function of pandas package will be used.

Example: -

output: -

Search This Blog

ROHIT's Smart Class Room