How to Substring Only the First Column in awk in Linux?

The “awk” command is a built-in Linux utility and the scripting language widely used to process and manipulate text patterns.  It has various functions like displaying the selected data, searching and replacing text, sorting the database index, extracting the substring, and many others. 

The most commonly used function of the “awk” command is “substring extraction”. It searches out the word or pattern of a specific length from the text file or the line or column provided with the “awk” command. 

This guide illustrates the complete procedure to substring only the first column in the “awk” command.

How to Substring Only the First Column in awk in Linux?

The “substring” function performs its operations based on the input provided by the user. 

Syntax:

The “substring” function relies on its generalized syntax that is typed below:

$ awk '{substr(string,start,length)}' filename

The above syntax contains the following parameters:

  • awk: Represents the main “awk” command line tool.
  • substr: Denotes the “substring” function.
  • string: Specifies the source pattern or the string which needs to be found
  • start: Defines the starting point of the searched input.
  • length: Identifies a search pattern/string length from the text. 
  • filename: Shows the file from which source input is searched.

Let’s see how the “substring” function uses the “awk” command to substring only the first column.

The “home” directory contains a “script.txt” file opened in the “nano” text editor having the following content:

$ nano script.txt

The “script.txt” file has three columns “Name”, “Age”, and “ID”.

Example 1: Print the Substring of the First Column (Start and End Point Defined)

Execute the “substring” function of the “awk” command to display the substring of the first column, “Name”, starting from “2” characters for the length of “3”:

$ awk '{print substr($1,2,3)}' script.txt

The “$1” is used to specify the column number, i.e., “Name”. The length of three characters substring has been displayed from the first column, “Name”. 

Example 2: Print the Substring of the First Column (Starting Point Defined Only)

The user can also exclude the  “length” parameter from the  “substr” function in the following way:

$ awk '{print substr($1,2)}' script.txt

At this time, each string defined in the “Name” column has been displayed starting from the “Second(2)” character to the end.

Example 3: Print the Substring of the First Column (Ending Point Defined Only)

Like the “length” parameter, It is also not compulsory to define the starting point of the string. The “substring” function can also search from the beginning of the string at the specified ending point.

Specify the “0(begin)” in place of starting point to search the substring of a particular length(user choice) from the first column “Name”:

$ awk '{print substr($1,0,5)}' script.txt

Here the string of length “5” from the first column “Name” has been printed on the terminal:

Example 4: Change the File Content

Until example 3, the output was being printed on the terminal. In contrast, the original content of the file remained the same. The “awk” command also allows you to change the file’s content, i.e., it replaces the result with the original content. 

In this scenario, the first column “Name” pattern is replaced by the “Emp-Names” using the below-mentioned command:

$ awk '{if($1=="Name") {$1="Emp-Names"} print $0}' script.txt

In the above command the “if” statement is used for comparison, i.e., is “$1(Column 1)” has string “Name” then replace it with “Emp-Name”. The “$0” is used to print the whole content of the “script.txt” file:

The “Name” string has been replaced with “Emp-Names” in the “script.txt” file.

Conclusion

In Linux, the “substr” denotes the “substring” function of the “awk” command that searches for the specified character/pattern from the targeted file. It returns the string starting at the particular position to extract the specified length of the character. In addition, the “awk” tool is also beneficial for replacing the original string/character/pattern written in the file.

This guide has provided a deep insight into substring only the first column in awk in Linux.