Linux, Tutorials

How to remove ^M and other non-printable characters from the file in Linux

Sending
User Rating 4.5 (2 votes)
How to remove ^M and other non-printable characters from the file
If you are working or playing around with files on different O.S., and you are just copying/moving files from one O.S. based system to other. Unknowingly you are inviting non-printable character to come with! This specially happens when you open any file in Linux using any editor (vim in my case), which you got from windows based machines.  You will see ^M at the end of each line (It is usually treated as carriage return/linefeed in Linux environment which we got from its step brother Windows :p ).

What is this ^M and how it came into the picture?
^M is Windows carriage return: Unix uses for newline 0xA, Windows a combination of two characters: 0xA(10 in decimal) 0xD(13 in decimal), ^M happens to be 0xD. It’s ASCII value of \n and \r respectively.
Usually whatever starts with ^ is treated as non-printable character (but don’t be so sure!). Like ^L or ^@ etc.How I will find if it’s printable or non-printable character?
Just try to open a file using cat without option and cat with option –v and observe its outputs.
Ex: cat test.txt  and cat –v test.txt (-v options prints non-printable character too.)

Output using cat command  without –v
cat test.txt
||This is first line
|| Lets see if ^M is rintable or non-printable this time
|| you can type ^@ even or ^S
||this line will give a page break signal for printers

|| cat command is important to check this with -v option
||0 Lets say its last line

Output using cat command with –v option
cat -v test.txt
|^@|This is first line^M
|^@| Lets see if ^M is rintable or non-printable this time^M
|^@| you can type ^@ even or ^S^M
|^@|this line will give a page break signal for printers
^M
|^@| cat command is important to check this with -v option^M
|^@|0 Lets say its last line^M

You can observe that ^@ at the beginning  and ^M at the last in each  line is non-printable character but ^M and ^@ in between was typed by me under vim, so it is printable here.  Just observe those two outputs.

Note: It doesn’t show ^L which was there in a file. Because it was a signal for a page break to the printer and it will not hamper the output. That’s we were seeing one blank line in our output whereas in an original file there is no blank line.

How we can solve it then?

To Solve ^M at last in each line we can do following things:

  1. Linux Command shell
     dos2unix file-name new-file-name
(dos2unix is a Linux utility tool which can convert windows oriented file to Linux compatible, so it will remove ^M        automatically from the end of each line
  1. By setting vim configuration file i.e. .vimrc
        Open .vimrc which will be at your home location i.e. /home/aliencoders/.vimrc
And save this line
set ffs=unix,dos
  1. By editing the file manually . Open test.txt in vim editor and use the following command to substitute in command mode i.e after pressing esc button.
       :1,$s/^M//g or :%s/^M//g     (Actually ctrl+v and ctrl+ M will show you ^M)
  1. By using strings command in Linux. This command can omit all non-printable characters from the file.
       strings file-name > new-file-name
Now, this new-file-name will not contain those non-printable characters.

Tips: To solve other such junks like ^@ either use steps 3 or steps 4

Comments, any other idea to fix it, feedbacks are most welcome!

Share your Thoughts