Archive for February 10th, 2015

Remove \r (Carriage Return) from string with standard bash shell / sed / tr / vim or awk – Replace \r hidden messy characters from files

Tuesday, February 10th, 2015

remove_r_carriage_return_from_string_with-standard-bash_shell_sed_tr_or_awk_replace_annoying_hidden_messy_characters_from_files

I've been recently writting this Apache webserver / Tomcat / JBoss / Java decomissioning bash script. Part of the script includes extraction from httpd.conf of DocumentRoot variable configured for Apache host.
I was using following one liner to grep and store DocumentRoot set directory into new variable:

documentroot=$(grep -i documentroot /usr/local/apache/conf/httpd.conf | awk '{ print $2 }' |sed -e 's#"##g');

Above line greps for documentroot prints 2nd column of the matchi (which is the Apache server set docroot and then removes any " chars).

However I faced the issue that parsed string contained in $documentroot variable there was mysteriously containing r – return carriage – this is usually Carriage Return (CR) sent by Mac OS and Apple computers. For those who don't know the End of Line of files in UNIX / Linux OS-es is LF – often abreviated as n – often translated as return new line), while Windows PCs use for EOF CR + LF – known as the infamous  rn. I was running the script from the server which is running SuSE SLES 11 Linux, meaning the CR + LF end of file is standardly used, however it seem someone has editted the httpd.conf earlier with a text editor from Mac OS X (Terminal). Thus I needed a way to remove the r from CR character out of the variable, because otherwise I couldn't use it to properly exec tar to archive the documentroot set directory, cause the documentroot directory was showing unexistent.

Opening the httpd.conf in standard editor didn't show the r at the end of
"directory", e.g. I could see in the file when opened with vim

DocumentRoot "/usr/local/apache/htdocs/site/www"

However obviously the r character was there to visualize it I had to use cat command -v option (–show-nonprinting):

cat -v /usr/local/apache/conf/httpd.conf

DocumentRoot "/usr/local/apache/htdocs/site/wwwr"


1. Remove the r CR with bash

To solve that with bash, I had to use another quick bash parsing that scans through $directory and removes r, here is how:

documentroot=${documentroot%$'r'}

It is also possible to use same example to remove "broken" Windows rn Carriage Returns after file is migrated from Windows to Liunx /  FreeBSD host:

documentroot=${documentroot%$'rn'}

 

2. Remove r Carriage Return character with sed

Other way to do remove (del) Windows / Mac OS Carriage Returns in case if Migrating to UNIX is with sed (stream editor).

sed -i s/r// filename >> filename_out.txt


3. Remove r CR character with tr

There is a third way also to do it with (tr) – translate or delete characters old shool *nix command:

tr -d 'r' < file_with_carriagereturns > file_without_carriage_returns

 

4. Remove r CRs with awk (pattern scanning and processing language)

 awk 'sub("$", "r")' inputf_with_crs.txt > outputf_without_crs.txt


5. Delete r CR with VIM editor

:%s/r//g


6. Converting  file DOS / UNIX OSes with dos2unix and unix2dos command line tools

For sysadmins who don't want to bother with writting code to convert CR when moving files between Windows and UNIX hosts there are dos2unix and unix2dos installable commands.

All done Cheers ! 🙂