4.2 Operations on Data Frame

Operations on Data Frames: -

  • Operations on Data frames help us in analyzing the data or manipulating the data.
  • In Pandas, there are different useful data operations for data frame.

 

1) To Find Number of Rows and Columns: -

  • We can find the number of rows and columns available in the data frame by using shape attribute.
  • It returns a tuple that contains number of rows and columns.
  • if we want to retrieve only rows or columns, we can read that number from the tuple.

            Example: -


            output: -    



2) To Retrieve Rows from Data Frame: -

  • There are two methods to retrieve the rows from data frame:

                a. head( ): It retrieves the first 5 rows from the data frame.

                b. tail( ): It retrieves the last 5 rows from the data frame.

  • To display the first 2 rows, we can use head( ) method by passing 2 as

            Example: -


            output: -



3) To Retrieve a Range of Rows: -

  • We can also treat the data frame as an object and retrieve the rows from it using slicing.
  • For example, to display the 2nd row to 4th row as:

            Example: -


            output: -



4) To Retrieve Column Names: -

  • We can also retrieve the column names from the data frame, we can use column attribute as

            Example: -


            output: -




5) To Retrieve Multiple Columns: -

  • We can also retrieve multiple column data by providing the list of column names as subscript to data frame as

            Example: -



            output: -



Reshaping Data: -

  • We can easily reshape the data by categorizing a specific column.
  • we will categorize the "Result" column i.e. Pass and Fail values in numbers form.

            Example: -



            output: -


Handling missing data: -

  • In many cases, the data received from various sources may not be perfect or some data might be missed.
  • For example, 'Emp.csv' file contains the following data where employee name is missing in one row and salary is missing in another row.



To handle this, the fillna( ) method can be used to replace all NaN values with a specified value, such as 0, 

            Example: -


            Output: -


If we don’t want the missing data and want to remove those rows, then we use dropna( ) method.

            Example: -


            Output: -


Data indexing and selection: -

  • Indexing means selecting particular rows and columns of data from a DataFrame.
  • Pandas supports 4 types of indexing:

                1. Using indexing operator [ ]

                2. Using a DataFrame using .loc[ ]

                3. Indexing a DataFrame using .iloc[ ]



1) Using indexing operator [ ]: -

  • Indexing operator is used to refer to the square brackets following an object.

Example: -


Output: -


2) Using a DataFrame using .loc[ ]: -

  • This function selects data by the label of the rows and columns.
  • The df.loc indexer selects the data in a different way than just the indexing operator.
  • It can select subsets of rows and columns.

            Example: -


            Output: -



3) Indexing a DataFrame using .iloc[ ]

  • This function allows us to retrieve rows and columns by position.
  • The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selection.
  • In order to select multiple rows, we can pass a list of integer to .iloc[ ].

             Example: -



            Output: -



Hierarchical indexing: -

  • Hierarchical indexing is also known as multi-indexing.
  • The multi-indexing is an advanced indexing technique for DataFrames that shows the multiple levels of indexes.
  • Here we use performance.csv file.

                Example: -



                Output: -








Popular posts from this blog

operators in c programming

2.4 Arrays in c programming