4.1 Pandas in Python

Introduction to Pandas: -

  • Pandas is the fastest and mostly used library for data analysis and data manipulation.
  • It is a high-level data manipulation tool developed by Wes McKinney.
  • It is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc.
  • The major outcomes of the panda are:

                    Data Analysis.

                    Data preparation.

                    Data manipulation.

                    Data modelling.

                    Data analysis.

 

Features of Pandas

  • It has fast and efficient DataFrame object with default and customized indexing.
  • It has tools for loading data into in-memory data objects from different file formats.
  • It is used for data alignment and integrated handling of missing data.
  • Easy handling of missing data (represented as NaN) in floating point as well as non-floating-point data.
  • Reshaping and pivoting of date sets.
  • Label-based slicing, indexing and sub-setting of large data sets.
  • Columns from a data structure can be deleted or inserted.
  • Group by data for aggregation and transformations.
  • High performance merging and joining of data.
  • Time Series functionality.

 

Benefits of Pandas: - The main advantages of Pandas are:

  • It has functions for analyzing, cleaning, exploring, and manipulating data.
  • It allows us to analyze big data and make conclusions based on statistical theories.
  • It can clean messy data sets, and make them readable and relevant.
  • It helps to shorten the procedure of handling data. With the time saved, we can focus more on data analysis algorithms.

 

Installing Pandas: -

  • click on the Start button to open the start menu.
  • Type “cmd,” and the Command Prompt app should appear as a listing in the start menu.
  • Enter the following command on the terminal.

 

                Py -m pip install pandas

 

Introduction to Panda Object: - Pandas support two data structures:

1. Series: - one-dimensional labeled arrays.

2. DataFrames: - two-dimensional data structure with columns, much like a table.

 

1. Series: -

  • Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).
  • A Pandas Series object is like a column in a table.
  • Series are generally created from:

            a. Arrays

            b. Lists

            c. Dict

 

a) Create a Series from Arrays: -

  • firstly, we have to import the numpy module and then use array( ) function in the program.

             Example: - 


            Output: -



b) Create a Series from Lists: -

  • In order to create a series from list, we have to first create a list after that we can create a series from list.

                 Example: -


                output: -


c) Create a Series from dict: -

  • We can also create a Series from dict.
  • All the keys in the dictionary will become the indices of the Series object, whereas all the values from the key-value pairs in the dictionary will become the values (data) of the Series object.

              Example: -


                output: -



2) DataFrame: -

  • A DataFrame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns.
  • Data frame is an object that is useful in representing data in the form of rows and columns.
  • Data frames are generally created from:

                a. List

                b. List of tuples

                c. Dictionary

                d. Excel spreadsheet files

                e. .csv (common separated values) files

 

a) Create a DataFrame from Lists: -

  • The DataFrame can be created using single list.
  • To do this we need to pass a python list as a parameter to the pandas DataFrame( ) function.
  • DataFrame( ) function is used to create a dataframe in Pandas.

            Example: -



            Output: -


b) Create a DataFrame from List of Tuples: -

  • The DataFrame can be created using list of tuples.
  • A tuple can be treated as a row of data.
  • Suppose, if we want to store the data of 3 employees, as we have to create 3 tuples.

            Example: -



            output: -


c) Create a DataFrame from Dictionary: -

  • It is also possible to create a python dictionary that contains employee data.
  • A dictionary stores data in the form of key-value pairs.
  • In this case, we take 'EMPID' and 'ENAME' as keys and corresponding lists as values.

            Example: -



            Output: -


d) Create a DataFrame from Excel Spreadsheet: -

  • We can also read an excel file as a DataFrame.
  • Let us assume excel spread sheet file named "Emp.xlsx".
  • We have created an excel file which contains data of employee id number, employee name, job and salary. This file is saved with the file name "Emp" with extension "xlsx".



  • To create a data frames, we need to first import the pandas package.
  • We also need xlrd package. XLRD package is useful to retrieve data from Excel file.
  • install xlrd package by using command prompt:

 

                    py -m pip install xlrd

                    py -m pip install openpyxl

 

  • To read the data from emp.xlsx file, read_excel( ) function of pandas package will be used.

            Example: -


            output: -



e) Create a DataFrame from .CSV file: -

  • CSV stands for "Comma Separated Values."
  • In many cases, the data will be in the form of .csv files.
  • It is the simplest form of storing data in tabular form as plain text.
  • It is similar to Excel file but it takes less memory.
  • It is important to know to work with .csv because we mostly rely on .csv data in our day-to-day lives as data scientists.


  • We have created CSV file which contains the same data as created during excel file i.e. data of employee id number, employee name, job, and salary of a company. This file is saved with the file name "Emp" with extension "csv".
  • To read the data from empdata.csv file, read_csv( ) function of pandas package will be used.

            Example: -


            output: -



Popular posts from this blog

operators in c programming

2.4 Arrays in c programming