One of the most common issues that data analysts face is needing more data or dealing with unknown values data. Similarly, the NaN values can cause errors in calculations and statistical analysis. in Python, Pandas is a widespread data analysis library that delivers a variety of tools to handle missing data. The dropna() function of the Pandas module is used to drop/remove rows with NaN values.
This post explains how to drop/remove that have NaN values in a DataFrame. This Python blog will cover the following topics:
- Using df.dropna() Function
- Drop Rows With NaN Values
- Drop Rows With NaN Values and Reset Index
- Drop Rows With NaN Values From Selected Columns
Using df.dropna() Function
To remove missing values from the data frame, the “df.dropna()” function of Pandas module is utilized in Python. This function is utilized to remove/eliminate the rows of the data frame that contain NULL values.
The syntax for “dropna()” is shown below:
DataFrameName.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
In the above syntax:
- The parameter named “axis” indicates the axis (rows→horizontal-axis, columns→vertical-axis) along which function is applied. The “axis=0” is used for rows and “axis=1” is used for columns.
- The “how” parameter specifies how the rows or columns should be dropped. It is set to ‘any’ by default, which means that if any of the values in the row or column are missing, the entire row or column is dropped.
- The “thresh” parameter specifies the minimum number of non-null values that a row or column must have to be kept.
- The “subset” parameter specifies the columns to consider when dropping rows or columns.
- The “inplace” defines whether to change the DataFrame in place or to retrieve a new DataFrame.
Example 1: Drop Rows With NaN Values
The below code is used to drop rows containing NaN values from the complete dataframe:
Code:
import pandas
data_frame = pandas.DataFrame({'Marks_1': ['15','ABC','20','XYZ','30'],
'Marks_2': ['YYY','10','35','40','150']
})
data_frame = data_frame.apply(pandas.to_numeric, errors='coerce')
print(data_frame)
data_frame = data_frame.dropna()
print('\n',data_frame)
- The module named “pandas” is imported.
- The “pd.DataFrame()” function creates the data frame.
- The “data_frame.apply()” function is used to convert the values in the given DataFrame to numerical values. The parameter “errors=’coerce’” is used to convert any non-numerical numbers to NaN.
- Lastly, the “dropna()” function is used to remove any rows that contain NaN values.
Output:
The rows containing the NaN values have been dropped successfully.
Example 2: Drop Rows With NaN Values and Reset Index
The following code is used to drop rows with NaN values and reset the index of the returned new DataFrame:
Code:
import pandas
data_frame = pandas.DataFrame({'Marks_1': ['15','ABC','20','XYZ','30'],
'Marks_2': ['YYY','10','35','40','150']
})
data_frame = data_frame.apply (pandas.to_numeric, errors='coerce')
print(data_frame)
data_frame = data_frame.dropna()
data_frame = data_frame.reset_index(drop=True)
print('\n',data_frame)
- The module named “pandas” is imported.
- The “pd.DataFrame()” creates the DataFrame based on the given data.
- The “df.apply()” function converts the non-numeric value into “NaN” values.
- The “df.dropna()” function drops the rows containing the “NaN” values.
- At last, the “df.reset_index()” function is used to reset the index of the dataframe.
Output:
The index has been reset successfully.
Example 3: Drop Rows With NaN Values From Selected Columns
The code block shown below removes all the rows containing values NaN values from selected columns:
Code:
import pandas
data_frame = pandas.DataFrame({'Marks_1': ['15','ABC','20','XYZ','30'],
'Marks_2': ['YYY','10','35','40','150'],
'Marks_3': ['1AA','23','45','RT','546']
})
data_frame = data_frame.apply (pandas.to_numeric, errors='coerce')
print(data_frame)
data_frame=data_frame.dropna(subset=['Marks_2','Marks_3'])
data_frame = data_frame.reset_index(drop=True)
print('\n',data_frame)
- The “pd.DataFrame()” creates a data frame containing three columns “Marks_1”, “Marks_2” and “Marks_3”.
- The “apply()” function converts all non-numeric values into NaN values.
- The “dropna()” function takes specific column names and removes rows containing “NaN” values.
- The index is reset by using the “df.reset_index()” function.
Output:
The rows containing NaN values in specified columns have been dropped.
Conclusion
The “dataframe,dropna()” function is used in Python to drop rows with NaN values from the complete Pandas DataFrame or from the specified columns. The “df.dropna()” function is used to drop rows with NaN values and also reset indexes using the “df.reset_index()” function. The “df.dropna()” function can also remove the NaN values from the specified multiple columns of DataFrame. This guide has presented a brief method to drop rows with NaN values.