Convert DataFrame column type from string to DateTime

Encountering time series data in Python is quite common. The time series data can be handled by a powerful tool, Pandas. Oftentimes a situation is encountered where it is required to convert the dataframe column type from string to DateTime. For instance, a series-based operation can only be applied if the date is in the proper format. To deal with such scenarios, several methods are used in Python.

The article discusses different methods that can convert the DataFrame column type from string to date time format with appropriate examples.

How to Convert a DataFrame Column Type From String to DateTime?

Pandas is a Python library that contains different functionalities enabling it to handle the data effectively. Data is mostly present in the form of data frames that consist of rows and columns containing the data in them which can be of any type. The data type of the data can be modified and in this article, we will specifically discuss the following methods to convert the string datatype in a data frame to the datetime format.

  1. Using DataFrame.astype() Function
  2. Using pandas.to_datetime() Function

Let us discuss each method in detail.

Method 1: Using DataFrame.astype() Function

In Python, the “DataFrame.astype()” function alters the data type of the selected column according to the specified data type. The following code explains how the DataFrame column type can be changed from a string to DateTime format:

import pandas as pd 
dframe = pd.DataFrame({'Award Show': ['Grammy Awards', 'MTV Music Awards', 'Emmy Awards'],
              'Date': ['2/13/2023', '5/8/2022', '9/17/2022'],
              'Time': ['8:00 PM', '7:30 PM', '8:00 PM']}) 
print (dframe, ' \n \n') 
dframe.info()

In the above code, 

  • The panda library is imported as pd
  • pd.DataFrame creates a data frame with the column representing the name of the award show, and what date and time it happened.
  • The data frame and its respective information are printed.

Output

The data frame is printed that displays the award show, its date and time. In the information, it shows that the data type of the Date is an object which is a string. The following output displays the data frame being created and the data type of each column:

Let’s implement the “.astype()” function to convert the data type of the Date object from a string to DateTime format with the following code:

dframe['Date'] = dframe['Date'].astype('datetime64[ns]') 
print (dframe, ' \n \n')
dframe.info()

In the above code,

  • The astype() function is called on the column Date of the data frame dframe. The parameter value of the function suggests that the Date column will be changed from the string to the DateTime format. 
  • The updated data frame and its respective information are printed.

Output

The data type of the column name “Date” has been successfully altered from object(string) to DateTime. There is also an addition of the data type datetime64[ns](1) in dtype which suggests that a data type has been added. The following output displays how a data frame column type from a string can be changed to datetime using astype() method:

Method 2: Using pandas.to_datetime() Function

Pandas provides a “to_datetime()” function that accepts a DateTime value in string format and converts it into a Python’s DateTime object. The following code illustrates how the string DateTime can be obtained through pandas.to_datetime() Function: 

import pandas as pd 
dframe = pd.DataFrame({'Date':['12/08/2023', '9/21/2022', '6/30/2023'], 
                'Tour':['Eras tour', 'Reputation tour', 'Blinkies tour'], 
                'Ticket price ':[12000, 10000, 14000]}) 
print (dframe, ' \n \n') 
dframe.info()

In the above code,

  • The pandas’ library is imported as pd.
  • A data frame is made using pd.DataFrame which keeps the data about when the tour will start, the name of the tour, and the ticket price for each tour.
  • The data frame is then printed.
  • The info() method is called on the dframe that gives information about the range index, columns, data type of each column, number of elements in the column, and memory usage.

Output

The output displays the data frame created in the code above. Here, we can observe that the data type of date is an object, not DateTime. pandas.to_datetime() is used to convert the data type into a datetime format which is implemented later. The following output displays the data frame created and its information:

Now, to implement the DateTime format, the “pd.to_datetime()” will be called on dframe[‘Date’] which will change its data type to DateTime. Here is the implementation of the pd.to_datetime() function:

dframe['Date'] = pd.to_datetime(dframe['Date'])
print (dframe, ' \n \n')
dframe.info()

In the above code, 

  • The pd.to_datetime() is called on the ‘Date’ object of the data frame dframe and saved in dframe[‘Date’].
  • The updated DataFrame is printed and before getting the required information about it, two-line spaces have been given for more readability
  • The info() method is called on the dframe.

Output

Note that corresponding to the Date, the Dtype is datetime64[ns]. The above code converts the Date object to the DateTime format which is displayed in the output given below:

Use Case

One use case of such conversion is that the yymmdd format can be changed to the yyyymmdd format.

Let us discuss different examples in which the yymmdd format is changed to yyyymmdd format.

Example 1: Converting the yymmdd Format to yyyymmdd Format

The following code changes the yymmdd format to yyyymmdd format by modifying the format parameter in pd.to_datetime:

import pandas as pd
attendee_data = [['200712', 550000, 'Eras tour'],
                ['200714', 541000, 'Reputation tour'],
                ['200716', 61500, 'Blinkies tour'],
                ['200719', 78000, 'Eras tour'],
                ['200721', 90000, 'Black wind tour'],
                ['200724', 63000, 'South  tour'],
                ['200729',90000, 'XYZ tour']]
dframe = pd.DataFrame(attendee_data,columns=['Dates', 'Attendees', 'Tour'])
print(dframe, '\n \n')
print(dframe.dtypes)

In the above code,

  • The pandas’ library is imported
  • The attendee_data is a nested list which contains different sublists.
  • The nested list is converted to a data frame using pd.DataFrame where the first element of each column represents the dates, the second element of each sublist represents the number of attendees and the third element in each sublist gives information about the tour.
  • The dataframe’s columns along with the respective data types are printed.

Output

The dates are given in yymmdd format where the type of the selected column which is “Dates” is an object i.e. string. The following snippet illustrates the data frame being created while the dates are in yymmdd format:

To obtain the yyyymmdd format, the format parameter is given the value %y%m%d and the following code shows its demonstration:

dframe['Dates'] = pd.to_datetime(dframe['Dates'], format='%y%m%d')
print('\n \n')
print(dframe)
print('\n' )
print(dframe.dtypes)

Output

Here, the format has been changed from yymmdd to yyyymmd. The data type of the Dates column has been changed from object i.e. string to the datetime format. The following output displays the updated data frame with dates in yyyymmdd format:

Example 2: Converting Multiple Columns of yymmmdd Into yyyymmdd Format

More than one column in the DataFrame can be altered from string to DateTime format by making changes in the format parameter of the pandas to_datetime() function:

import pandas as pd
attendee_data = [['200712', 50000, 'Eras', '201213'],
                ['200714', 51000, 'Reputation', '210715'],
                ['200716', 51500, 'Blinkies', '210716'],
                ['200719', 53000, 'Eras', '201231'],
                ['200721', 54000, 'Reputation', '210715'],
                ['200724', 55000, 'Blinkies', '210716'],
                ['200729', 57000, 'Eras', '201231']]
dframe = pd.DataFrame(attendee_data,columns=['Tour_started', 'Attendees','tour', 'Tour_ended'])
print(dframe, '\n \n')
print(dframe.dtypes)

In the above code,

  • The pandas’ library is imported.
  • The same nested list is used as in the above example but another element in its sublist has been added. Now each first element of the sublists represents the starting date of the tour and the last element of each sublist represents when the tour will end.
  • The data frame and the information about its data types are printed.

Output

The type of the Tour_started and Tour_ended columns is the object which is a string. The following output displays the ending and starting dates of the tour which are in the yymmdd format:

Now to convert the dates from yymmd to yyyymmdd format, the format parameter in the pd.to_datetime() function of each column containing the dates will be modified. The following code shows how the date format is changed in the columns Tour_started and Tour_ended:

dframe['Tour_started'] = pd.to_datetime(
dframe['Tour_started'],
format='%Y%m%d'
)
dframe['Tour_ended'] = pd.to_datetime(
dframe['Tour_ended'],
format='%Y%m%d'
)
print(dframe)
print('\n \n')
print(dframe.dtypes)

Output

The data types of the columns Tour_started and Tour ended have been changed from object(string) to DateTime. The following output displays that the dates in the columns Tour_started and Tour_ended have been changed to yyyymmdd format:

Why Choose the YYYYMMDD Format?

YYYYMMDD format is chosen for greater readability. Dates that are alphabetically in sequence are also placed with a chronological harmony which makes the sorting of the dates really convenient.

Conclusion

In Python programming, using pd.to_datetime() and DataFrame.astype() can convert the data frame column type from string to DateTime. It can be used to convert the format of the date object from yymmdd to yyyymmdd, which improves readability and helps in the sorting of date data. The article discusses different methods to convert the DataFrame column type from string to DateTime and illustrates each method with an example.