What difference does it make matching a word with/without a trailing whitespace?with sed, how can I replace word within a matching line?Sed command that would ignore any commented matchHow to search for the word stored in the hold space with sed?How to delete everything (in every line) in a text file after a pattern of characters(including the pattern)?insert new lines into a csv file obtained via curl on an apiHow to extract delimited blocks of text from a file and have munpack decode them?sed - calling a variable from a file with multilineWhy might sed not make any change to a file?Delete text block with matching search wordsed replace matching line which does not start with #
Do creatures with a listed speed of "0 ft., fly 30 ft. (hover)" ever touch the ground?
How could indestructible materials be used in power generation?
Can a virus destroy the BIOS of a modern computer?
How to compactly explain secondary and tertiary characters without resorting to stereotypes?
How can a day be of 24 hours?
Is this draw by repetition?
How do conventional missiles fly?
Finding the reason behind the value of the integral.
My ex-girlfriend uses my Apple ID to log in to her iPad. Do I have to give her my Apple ID password to reset it?
Can compressed videos be decoded back to their uncompresed original format?
Why is the sentence "Das ist eine Nase" correct?
What's the meaning of "Sollensaussagen"?
What are the G forces leaving Earth orbit?
Should I tell management that I intend to leave due to bad software development practices?
Ambiguity in the definition of entropy
Implication of namely
How to remove border from elements in the last row?
What is the fastest integer factorization to break RSA?
Processor speed limited at 0.4 Ghz
Is there a hemisphere-neutral way of specifying a season?
Why didn't Boeing produce its own regional jet?
Avoiding the "not like other girls" trope?
Does int main() need a declaration on C++?
Sums of two squares in arithmetic progressions
What difference does it make matching a word with/without a trailing whitespace?
with sed, how can I replace word within a matching line?Sed command that would ignore any commented matchHow to search for the word stored in the hold space with sed?How to delete everything (in every line) in a text file after a pattern of characters(including the pattern)?insert new lines into a csv file obtained via curl on an apiHow to extract delimited blocks of text from a file and have munpack decode them?sed - calling a variable from a file with multilineWhy might sed not make any change to a file?Delete text block with matching search wordsed replace matching line which does not start with #
I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed
on the same site: 'Sed' command #1:
For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.
First of all I tried,
sed 's/the/this/'
but in that sample test case failed. Then I tried
sed 's/the /this /'
and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?
sed whitespace
New contributor
add a comment |
I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed
on the same site: 'Sed' command #1:
For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.
First of all I tried,
sed 's/the/this/'
but in that sample test case failed. Then I tried
sed 's/the /this /'
and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?
sed whitespace
New contributor
I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".
– Dubu
yesterday
Well, in thiseory, yes, in practice, no.
– Rolf
16 hours ago
add a comment |
I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed
on the same site: 'Sed' command #1:
For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.
First of all I tried,
sed 's/the/this/'
but in that sample test case failed. Then I tried
sed 's/the /this /'
and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?
sed whitespace
New contributor
I am learning shell-scripting and for that I am using HackerRank. There is a question related to sed
on the same site: 'Sed' command #1:
For each line in a given input file, transform the first occurrence of the word 'the' with 'this'. The search and transformation should be strictly case sensitive.
First of all I tried,
sed 's/the/this/'
but in that sample test case failed. Then I tried
sed 's/the /this /'
and it worked. So, the question arises what difference did the whitespaces created? Am I missing something here?
sed whitespace
sed whitespace
New contributor
New contributor
edited yesterday
Kusalananda♦
139k17259430
139k17259430
New contributor
asked 2 days ago
JHAJHA
575
575
New contributor
New contributor
I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".
– Dubu
yesterday
Well, in thiseory, yes, in practice, no.
– Rolf
16 hours ago
add a comment |
I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".
– Dubu
yesterday
Well, in thiseory, yes, in practice, no.
– Rolf
16 hours ago
I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".
– Dubu
yesterday
I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".
– Dubu
yesterday
Well, in thiseory, yes, in practice, no.
– Rolf
16 hours ago
Well, in thiseory, yes, in practice, no.
– Rolf
16 hours ago
add a comment |
3 Answers
3
active
oldest
votes
The difference is whether there is a space after the
in the input text.
For instance:
With a sentence without a space, no replacement:
$ echo 'theman' | sed 's/the /this /'
theman
With a sentence with a space, works as expected:
$ echo 'the man' | sed 's/the /this /'
this man
With a sentence with another whitespace character,
no replacement will occur:
$ echo -e 'thetman' | sed 's/the /this /'
the man
New contributor
I missed that. I had to take "the" as a string. Not a substring.
– JHA
2 days ago
1
@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence.the( |$)
might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where"the "
fails. Kusalanada's answer is significantly better, I'd recommend accepting it.
– Peter Cordes
yesterday
add a comment |
It's a cheap and error-prone way of doing word matching.
Note that the
with a space after it does not match the word thereby
, so matching with a space after the
avoids matching that string at the start of words. However, it still does match bathe
(if followed by a space), and it does not match the
at the end of a line.
To match the word the
properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.
Instead, use a zero-width word boundary pattern:
sed 's/<the>/this/'
The <
and >
matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_]
(or [A-Za-z0-9_]
in the POSIX locale).
With GNU sed
, you could also use b
in place of <
and >
:
sed 's/btheb/this/'
add a comment |
sed works with regular expressions.
Using sed 's/the /this /'
you just make the space after the
part of the matched pattern.
Using sed 's/the/this/'
you replace all occurrences of the
with this
no matter if a space exists after the
.
In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).
You can see the difference if you try for example to capitalize the
in the word the theater
:
echo 'the theater' |sed 's/the /THE /g'
THE theater
#theater is ignored since the is not followed by space
echo 'the theater' |sed 's/the/THE/g'
THE THEater
#both the are capitalized.
Thank you for the answer. Appreciated :)
– JHA
2 days ago
"you replace all occurrences" To be clear: Without theg
after the replacement text, you replace only the first occurrence.
– Dubu
yesterday
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
JHA is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509765%2fwhat-difference-does-it-make-matching-a-word-with-without-a-trailing-whitespace%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
The difference is whether there is a space after the
in the input text.
For instance:
With a sentence without a space, no replacement:
$ echo 'theman' | sed 's/the /this /'
theman
With a sentence with a space, works as expected:
$ echo 'the man' | sed 's/the /this /'
this man
With a sentence with another whitespace character,
no replacement will occur:
$ echo -e 'thetman' | sed 's/the /this /'
the man
New contributor
I missed that. I had to take "the" as a string. Not a substring.
– JHA
2 days ago
1
@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence.the( |$)
might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where"the "
fails. Kusalanada's answer is significantly better, I'd recommend accepting it.
– Peter Cordes
yesterday
add a comment |
The difference is whether there is a space after the
in the input text.
For instance:
With a sentence without a space, no replacement:
$ echo 'theman' | sed 's/the /this /'
theman
With a sentence with a space, works as expected:
$ echo 'the man' | sed 's/the /this /'
this man
With a sentence with another whitespace character,
no replacement will occur:
$ echo -e 'thetman' | sed 's/the /this /'
the man
New contributor
I missed that. I had to take "the" as a string. Not a substring.
– JHA
2 days ago
1
@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence.the( |$)
might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where"the "
fails. Kusalanada's answer is significantly better, I'd recommend accepting it.
– Peter Cordes
yesterday
add a comment |
The difference is whether there is a space after the
in the input text.
For instance:
With a sentence without a space, no replacement:
$ echo 'theman' | sed 's/the /this /'
theman
With a sentence with a space, works as expected:
$ echo 'the man' | sed 's/the /this /'
this man
With a sentence with another whitespace character,
no replacement will occur:
$ echo -e 'thetman' | sed 's/the /this /'
the man
New contributor
The difference is whether there is a space after the
in the input text.
For instance:
With a sentence without a space, no replacement:
$ echo 'theman' | sed 's/the /this /'
theman
With a sentence with a space, works as expected:
$ echo 'the man' | sed 's/the /this /'
this man
With a sentence with another whitespace character,
no replacement will occur:
$ echo -e 'thetman' | sed 's/the /this /'
the man
New contributor
edited 2 days ago
G-Man
13.6k93770
13.6k93770
New contributor
answered 2 days ago
BDRBDR
1035
1035
New contributor
New contributor
I missed that. I had to take "the" as a string. Not a substring.
– JHA
2 days ago
1
@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence.the( |$)
might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where"the "
fails. Kusalanada's answer is significantly better, I'd recommend accepting it.
– Peter Cordes
yesterday
add a comment |
I missed that. I had to take "the" as a string. Not a substring.
– JHA
2 days ago
1
@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence.the( |$)
might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where"the "
fails. Kusalanada's answer is significantly better, I'd recommend accepting it.
– Peter Cordes
yesterday
I missed that. I had to take "the" as a string. Not a substring.
– JHA
2 days ago
I missed that. I had to take "the" as a string. Not a substring.
– JHA
2 days ago
1
1
@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence.
the( |$)
might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the "
fails. Kusalanada's answer is significantly better, I'd recommend accepting it.– Peter Cordes
yesterday
@JHA: It also matters at the end of a line. e.g. the word "the" could appear at the end of a line as part of a file with line wrapping, but still be in the middle of a paragraph and thus still be a normal word in an English sentence.
the( |$)
might be closer to working, if that Extended regex works. Anyway, IDK what you mean "as a string" vs. substring. In both cases it's a substring of the whole line, and your testcases are insufficient to detect the cases where "the "
fails. Kusalanada's answer is significantly better, I'd recommend accepting it.– Peter Cordes
yesterday
add a comment |
It's a cheap and error-prone way of doing word matching.
Note that the
with a space after it does not match the word thereby
, so matching with a space after the
avoids matching that string at the start of words. However, it still does match bathe
(if followed by a space), and it does not match the
at the end of a line.
To match the word the
properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.
Instead, use a zero-width word boundary pattern:
sed 's/<the>/this/'
The <
and >
matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_]
(or [A-Za-z0-9_]
in the POSIX locale).
With GNU sed
, you could also use b
in place of <
and >
:
sed 's/btheb/this/'
add a comment |
It's a cheap and error-prone way of doing word matching.
Note that the
with a space after it does not match the word thereby
, so matching with a space after the
avoids matching that string at the start of words. However, it still does match bathe
(if followed by a space), and it does not match the
at the end of a line.
To match the word the
properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.
Instead, use a zero-width word boundary pattern:
sed 's/<the>/this/'
The <
and >
matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_]
(or [A-Za-z0-9_]
in the POSIX locale).
With GNU sed
, you could also use b
in place of <
and >
:
sed 's/btheb/this/'
add a comment |
It's a cheap and error-prone way of doing word matching.
Note that the
with a space after it does not match the word thereby
, so matching with a space after the
avoids matching that string at the start of words. However, it still does match bathe
(if followed by a space), and it does not match the
at the end of a line.
To match the word the
properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.
Instead, use a zero-width word boundary pattern:
sed 's/<the>/this/'
The <
and >
matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_]
(or [A-Za-z0-9_]
in the POSIX locale).
With GNU sed
, you could also use b
in place of <
and >
:
sed 's/btheb/this/'
It's a cheap and error-prone way of doing word matching.
Note that the
with a space after it does not match the word thereby
, so matching with a space after the
avoids matching that string at the start of words. However, it still does match bathe
(if followed by a space), and it does not match the
at the end of a line.
To match the word the
properly (or any other word), you should not use spaces around the word, as that would prevent you from matching it at the start or end of lines or if it's flanked by any other non-word character, such as any punctuation or tab character, for example.
Instead, use a zero-width word boundary pattern:
sed 's/<the>/this/'
The <
and >
matches the boundaries before and after the word, i.e. the space between a word character and a non-word character. A word character is generally any character matching [[:alnum:]_]
(or [A-Za-z0-9_]
in the POSIX locale).
With GNU sed
, you could also use b
in place of <
and >
:
sed 's/btheb/this/'
edited yesterday
answered 2 days ago
Kusalananda♦Kusalananda
139k17259430
139k17259430
add a comment |
add a comment |
sed works with regular expressions.
Using sed 's/the /this /'
you just make the space after the
part of the matched pattern.
Using sed 's/the/this/'
you replace all occurrences of the
with this
no matter if a space exists after the
.
In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).
You can see the difference if you try for example to capitalize the
in the word the theater
:
echo 'the theater' |sed 's/the /THE /g'
THE theater
#theater is ignored since the is not followed by space
echo 'the theater' |sed 's/the/THE/g'
THE THEater
#both the are capitalized.
Thank you for the answer. Appreciated :)
– JHA
2 days ago
"you replace all occurrences" To be clear: Without theg
after the replacement text, you replace only the first occurrence.
– Dubu
yesterday
add a comment |
sed works with regular expressions.
Using sed 's/the /this /'
you just make the space after the
part of the matched pattern.
Using sed 's/the/this/'
you replace all occurrences of the
with this
no matter if a space exists after the
.
In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).
You can see the difference if you try for example to capitalize the
in the word the theater
:
echo 'the theater' |sed 's/the /THE /g'
THE theater
#theater is ignored since the is not followed by space
echo 'the theater' |sed 's/the/THE/g'
THE THEater
#both the are capitalized.
Thank you for the answer. Appreciated :)
– JHA
2 days ago
"you replace all occurrences" To be clear: Without theg
after the replacement text, you replace only the first occurrence.
– Dubu
yesterday
add a comment |
sed works with regular expressions.
Using sed 's/the /this /'
you just make the space after the
part of the matched pattern.
Using sed 's/the/this/'
you replace all occurrences of the
with this
no matter if a space exists after the
.
In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).
You can see the difference if you try for example to capitalize the
in the word the theater
:
echo 'the theater' |sed 's/the /THE /g'
THE theater
#theater is ignored since the is not followed by space
echo 'the theater' |sed 's/the/THE/g'
THE THEater
#both the are capitalized.
sed works with regular expressions.
Using sed 's/the /this /'
you just make the space after the
part of the matched pattern.
Using sed 's/the/this/'
you replace all occurrences of the
with this
no matter if a space exists after the
.
In the HackerRank exercise, the result is the same because to replace the with this is logical... you replace just a pro-noun which by default is followed by space (grammar rules).
You can see the difference if you try for example to capitalize the
in the word the theater
:
echo 'the theater' |sed 's/the /THE /g'
THE theater
#theater is ignored since the is not followed by space
echo 'the theater' |sed 's/the/THE/g'
THE THEater
#both the are capitalized.
edited 2 days ago
JHA
575
575
answered 2 days ago
George VasiliouGeorge Vasiliou
5,79531030
5,79531030
Thank you for the answer. Appreciated :)
– JHA
2 days ago
"you replace all occurrences" To be clear: Without theg
after the replacement text, you replace only the first occurrence.
– Dubu
yesterday
add a comment |
Thank you for the answer. Appreciated :)
– JHA
2 days ago
"you replace all occurrences" To be clear: Without theg
after the replacement text, you replace only the first occurrence.
– Dubu
yesterday
Thank you for the answer. Appreciated :)
– JHA
2 days ago
Thank you for the answer. Appreciated :)
– JHA
2 days ago
"you replace all occurrences" To be clear: Without the
g
after the replacement text, you replace only the first occurrence.– Dubu
yesterday
"you replace all occurrences" To be clear: Without the
g
after the replacement text, you replace only the first occurrence.– Dubu
yesterday
add a comment |
JHA is a new contributor. Be nice, and check out our Code of Conduct.
JHA is a new contributor. Be nice, and check out our Code of Conduct.
JHA is a new contributor. Be nice, and check out our Code of Conduct.
JHA is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f509765%2fwhat-difference-does-it-make-matching-a-word-with-without-a-trailing-whitespace%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I assume the first version also "worked", but not as you expected. It should have replaced the first occurrence of the letter sequence "the", but you probably looked at the first occurrence of the word " the ".
– Dubu
yesterday
Well, in thiseory, yes, in practice, no.
– Rolf
16 hours ago