4.2 Operations on Data Frame
Operations on Data Frames: -
- Operations on Data frames help us in analyzing the data or manipulating the data.
- In Pandas, there are different useful data operations for data frame.
1) To Find Number of Rows and Columns: -
- We can find the number of rows and columns available in the data frame by using shape attribute.
- It returns a tuple that contains number of rows and columns.
- if we want to retrieve only rows or columns, we can read that number from the tuple.
Example: -
output: -
2) To Retrieve Rows from Data Frame: -
- There are two methods to retrieve the rows from data frame:
a. head( ): It retrieves the first 5 rows
from the data frame.
b. tail( ): It retrieves the last 5 rows
from the data frame.
- To display the first 2 rows, we can use head( ) method by passing 2 as
Example: -
output: -
3) To Retrieve a Range of Rows: -
- We can also treat the data frame as an object and retrieve the rows from it using slicing.
- For example, to display the 2nd row to 4th row as:
Example: -
output: -
4) To Retrieve Column Names: -
- We can also retrieve the column names from the data frame, we can use column attribute as
Example: -
output: -
5) To Retrieve Multiple Columns: -
- We can also retrieve multiple column data by providing the list of column names as subscript to data frame as
Example: -
output: -
Reshaping Data: -
- We can easily reshape the data by categorizing a specific column.
- we will categorize the "Result" column i.e. Pass and Fail values in numbers form.
Example: -
output: -
Handling missing data: -
- In many cases, the data received from various sources may not be perfect or some data might be missed.
- For example, 'Emp.csv' file contains the following data where employee name is missing in one row and salary is missing in another row.
To handle this, the fillna( ) method can be used
to replace all NaN values with a specified value, such as 0,
Example: -
Output: -
If we don’t want the missing data and want to remove
those rows, then we use dropna( ) method.
Example: -
Output: -
Data indexing and selection: -
- Indexing means selecting particular rows and columns of data from a DataFrame.
- Pandas supports 4 types of indexing:
1. Using indexing operator [ ]
2. Using a DataFrame using .loc[ ]
3. Indexing a DataFrame using .iloc[ ]
1) Using indexing operator [ ]: -
- Indexing operator is used to refer to the square brackets following an object.
Example: -
Output: -
2) Using a DataFrame using .loc[ ]: -
- This function selects data by the label of the rows and columns.
- The df.loc indexer selects the data in a different way than just the indexing operator.
- It can select subsets of rows and columns.
Example: -
Output: -
3) Indexing a DataFrame using .iloc[ ]
- This function allows us to retrieve rows and columns by position.
- The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selection.
- In order to select multiple rows, we can pass a list of integer to .iloc[ ].
Example: -
Output: -
Hierarchical indexing: -
- Hierarchical indexing is also known as multi-indexing.
- The multi-indexing is an advanced indexing technique for DataFrames that shows the multiple levels of indexes.
- Here we use performance.csv file.
Example: -
Output: -