Will anyone guide me to grep only words containing pattern _ARA from the below single string.
String:
LINK:['IM219MIR_ARA1','IM18Q4_ARA1','SM18Q4_ARA1','IM18PLANNING_ARA1','IM118Q4DYNVA_ARA1','IM218Q4DYNVA_ARA1','IM119EIOPALTG_ARA1','IM219EIOPALTG_ARA1','SM119EIOPALTG_ARA1']}Expected output:
IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1 0 2 Answers
grep accepts -o to print only matching text, on separate lines even if the matches came from the same line. It also accepts -w to force the regular expression to match an entire word (or not match at all), where a word is a maximal sequence of letters, numerals, and underscores. So you can simply use:
grep -ow '\w*_ARA\w*'In this case you can actually omit the -w option if you like, and get the same result, since the regular expression here is explicitly matching only word characters with \w.
That will read from standard input because there are no filename arguments. If the text you showed is in a file--say, called input.txt--then you would pass that as an argument:
grep -ow '\w*_ARA\w*' input.txtThis outputs:
IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1Technically, the output this produces is slightly different from what you showed in your question, because the expected output you showed lists IM119EIOPALTG_ARA1 twice, even though it appears only once in the text you showed. I presume this is a mistake and you actually want it just once.
If you want to use cut and sed commands, use this :
<test.txt cut -d'[' -f2 | cut -d']' -f1 | sed "s/,'/\\n/g" | sed 's/.$//' | cut -d\' -f2 | grep _ARAExplanation in 2 parts:
grep _ARAwould find lines that must be filteredcut -d'[' -f2will remove characters before your words, same forcut -d']' -f1which would remove what is aftersed "s/,'/\\n/g"will extract each word in one line<test.txtis just a redirection forcutandgrepcommand
After this 4 previous commands, result is :
'IM219MIR_ARA1'
IM18Q4_ARA1'
SM18Q4_ARA1'
IM18PLANNING_ARA1'
IM118Q4DYNVA_ARA1'
IM218Q4DYNVA_ARA1'
IM119EIOPALTG_ARA1'
IM219EIOPALTG_ARA1'
SM119EIOPALTG_ARA1'So, to remove the ' at the end of each word, we add
sed 's/.$//'and for the 1rst ', we use
cut -d\' -f2So the final result is :
IM219MIR_ARA1
IM18Q4_ARA1
SM18Q4_ARA1
IM18PLANNING_ARA1
IM118Q4DYNVA_ARA1
IM218Q4DYNVA_ARA1
IM119EIOPALTG_ARA1
IM219EIOPALTG_ARA1
SM119EIOPALTG_ARA1If you want more details about this command, you can read my discussion with Eliah Kagan.