What does the following number-processing sed code mean?A challenge for sed convert code from Mathematica to Matlabhow to modify matches to a regular expression with sed or another tool?sed insert in the beginning of multiple files is not workingExplanation for 'sed'How to use appropriate regex to find a pattern in awk?Bash numeric sort gives different results when columns are selected simultaneously vs. togethersed code understanding for text processingRemove a combination of numbers and symbols from a string using the $VARNAME//pattern/ waychange only part of the substring using sedWhen sed is used in an expect command, what is the proper way to escape the backslashes?

Did Dumbledore lie to Harry about how long he had James Potter's invisibility cloak when he was examining it? If so, why?

How easy is it to start Magic from scratch?

Hostile work environment after whistle-blowing on coworker and our boss. What do I do?

How do scammers retract money, while you can’t?

How do we know the LHC results are robust?

Increase performance creating Mandelbrot set in python

Different result between scanning in Epson's "color negative film" mode and scanning in positive -> invert curve in post?

Class Action - which options I have?

How do I extract a value from a time formatted value in excel?

A particular customize with green line and letters for subfloat

Integer addition + constant, is it a group?

Sort a list by elements of another list

Lay out the Carpet

Is this apparent Class Action settlement a spam message?

Energy of the particles in the particle accelerator

Is oxalic acid dihydrate considered a primary acid standard in analytical chemistry?

Is a stroke of luck acceptable after a series of unfavorable events?

Term for the "extreme-extension" version of a straw man fallacy?

Do the temporary hit points from the Battlerager barbarian's Reckless Abandon stack if I make multiple attacks on my turn?

How does it work when somebody invests in my business?

Trouble understanding the speech of overseas colleagues

Purchasing a ticket for someone else in another country?

Opposite of a diet

Avoiding estate tax by giving multiple gifts

What does the following number-processing sed code mean?

A challenge for sed convert code from Mathematica to Matlabhow to modify matches to a regular expression with sed or another tool?sed insert in the beginning of multiple files is not workingExplanation for 'sed'How to use appropriate regex to find a pattern in awk?Bash numeric sort gives different results when columns are selected simultaneously vs. togethersed code understanding for text processingRemove a combination of numbers and symbols from a string using the $VARNAME//pattern/ waychange only part of the substring using sedWhen sed is used in an expect command, what is the proper way to escape the backslashes?

-1

I came across a code in script which add ',' after every 3 digits from backwards. the code only considers numeric data.

Following is the code.

sed 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' number.txt

number.txt

1234
12345
123456

output

Can anyone explain the code flow..

edited yesterday

Jeff Schaller♦

44k1161142

asked yesterday

Dhaval kale

New contributor

1

You appear to know what the code does; there's no branching, and only one command. Where does your misunderstanding come in?

– Jeff Schaller♦
yesterday

add a comment |

-1

I came across a code in script which add ',' after every 3 digits from backwards. the code only considers numeric data.

Following is the code.

sed 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' number.txt

number.txt

1234
12345
123456

output

Can anyone explain the code flow..

edited yesterday

Jeff Schaller♦

44k1161142

asked yesterday

Dhaval kale

New contributor

1

You appear to know what the code does; there's no branching, and only one command. Where does your misunderstanding come in?

– Jeff Schaller♦
yesterday

add a comment |

-1

I came across a code in script which add ',' after every 3 digits from backwards. the code only considers numeric data.

Following is the code.

sed 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' number.txt

number.txt

1234
12345
123456

output

Can anyone explain the code flow..

edited yesterday

Jeff Schaller♦

44k1161142

asked yesterday

Dhaval kale

New contributor

I came across a code in script which add ',' after every 3 digits from backwards. the code only considers numeric data.

Following is the code.

sed 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' number.txt

number.txt

1234
12345
123456

output

Can anyone explain the code flow..

sed regular-expression numeric-data

edited yesterday

Jeff Schaller♦

44k1161142

asked yesterday

Dhaval kale

New contributor

edited yesterday

Jeff Schaller♦

44k1161142

asked yesterday

Dhaval kale

New contributor

edited yesterday

Jeff Schaller♦

44k1161142

edited yesterday

Jeff Schaller♦

44k1161142

edited yesterday

Jeff Schaller♦

44k1161142

asked yesterday

Dhaval kale

New contributor

asked yesterday

Dhaval kale

asked yesterday

Dhaval kale

New contributor

Dhaval kale is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

You appear to know what the code does; there's no branching, and only one command. Where does your misunderstanding come in?

– Jeff Schaller♦
yesterday

add a comment |

1

You appear to know what the code does; there's no branching, and only one command. Where does your misunderstanding come in?

– Jeff Schaller♦
yesterday

You appear to know what the code does; there's no branching, and only one command. Where does your misunderstanding come in?

– Jeff Schaller♦
yesterday

add a comment |

3 Answers
3

active

oldest

votes

sed (the stream editor) can operate in search and replace mode using regular expressions. There is a bit of sed-specific escaping going on, but for the regex itself, you can confer the explanation tool from regexr:

( Capturing group #1. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

^ Beginning. Matches the beginning of the
string.

| Alternation. Acts like a boolean OR. Matches
the expression before or after the |.

[^ Negated set. Match any character that is not
in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

. Character. Matches a "." character.

]

)

( Capturing group #2. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

+ Quantifier. Match 1 or more of the preceding
token.

)

( Capturing group #3. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

3 Quantifier. Match 3 of the preceding token.

)

edited yesterday

Jeff Schaller♦

44k1161142

answered yesterday

Hermann

957515

1

... "0-9 Range. Matches a character in the range "0" to "9". Case sensitive"; By Case sensitive, you mean English digits? or?

– αғsнιη
yesterday

No, case sensitive means big and small letters make a difference. Regular expressions are usually case sensitive (unless configured otherwise e.g. an i switch). In this particular case, the character set only contains digits. They have big and small variants so the case sensitivity does not actually matter. This is the output as given by regexr – feel free to try it. The explanation tool is quite interactive.

– Hermann
yesterday

add a comment |

The pattern captures (1) the start of line or something that is not a digit nor a dot, followed by (2) any number of digits, at least one, followed by (3) exactly three digits.
It then puts them back with a comma between (2) and (3), in effect adding one thousands separator. The first group is only required to avoid touching fractional parts after a decimal point, since we don't want 1.2345 to turn into 1.2,345.

Note that the pattern is written in basic regular expressions (BRE), requiring backslashes in front of each () and to make them special. Moreover, it requires GNU sed, where + and | also have special meanings in BRE as an extension. The command would be better written as an extended regular expression (sed -E is supported by many sed implementations):

sed -E 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g'

Also, the pattern only does one replacement, it does not add multiple thousand separators to the same number. The /g at the end will match multiple times on the same line, but it doesn't process already replaced data. 1234567 will become 1234,567, not 1,234,567. To fix that, we'll need to add a loop:

sed -E -e :a -e 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' -e ta

Here, :a is just a label, and the final ta tests for a successful replacement, and jumps back to a if a replacement was done, in effect repeating the process as many times as it does something. Hence 1234567 will turn into 1,234,567.

answered yesterday

ilkkachu

62.8k10103180

add a comment |

sed 's/foo/bar/g' number.txt: This reads the file number.txt, and replaces the regex pattern foo with bar. This will occur for all matches on each line (/g).

(^|[^0-9.])([0-9]+)([0-9]3): This is the pattern to replace. Each part in the escaped parentheses (…) is a "capturing group". The pattern inside is "captured" for later use.
- (^|[^0-9.]): Find the beginning of the line ^ or | a character that is not a numeral or period [^0-9.]. This essentially finds the character preceding a number.
- ([0-9]+): Find one or more numerals [0-9]+.
- ([0-9]3): Find 3 numerals [0-9]3.

12,3: replace the above matches with the first two capturing groups followed by , then the last capturing group. In other words, insert , between the second and third patterns.

Because sed is "greedy", it will try to maximise the length of the match. Hence the final capturing group will be the last three numerals in the number.

N.B. many of the "special" characters are escaped with , e.g. (…) and 3. If your sed supported "extended regular expressions" with -E or -r, then you would not have to escape these. This would improve readability.

edited yesterday

answered yesterday

Sparhawk

10.3k744101

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

Dhaval kale is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f508697%2fwhat-does-the-following-number-processing-sed-code-mean%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

( Capturing group #1. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

^ Beginning. Matches the beginning of the
string.

| Alternation. Acts like a boolean OR. Matches
the expression before or after the |.

[^ Negated set. Match any character that is not
in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

. Character. Matches a "." character.

]

)

( Capturing group #2. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

+ Quantifier. Match 1 or more of the preceding
token.

)

( Capturing group #3. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

3 Quantifier. Match 3 of the preceding token.

)

edited yesterday

Jeff Schaller♦

44k1161142

answered yesterday

Hermann

957515

1

... "0-9 Range. Matches a character in the range "0" to "9". Case sensitive"; By Case sensitive, you mean English digits? or?

– αғsнιη
yesterday

No, case sensitive means big and small letters make a difference. Regular expressions are usually case sensitive (unless configured otherwise e.g. an i switch). In this particular case, the character set only contains digits. They have big and small variants so the case sensitivity does not actually matter. This is the output as given by regexr – feel free to try it. The explanation tool is quite interactive.

– Hermann
yesterday

add a comment |

( Capturing group #1. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

^ Beginning. Matches the beginning of the
string.

| Alternation. Acts like a boolean OR. Matches
the expression before or after the |.

[^ Negated set. Match any character that is not
in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

. Character. Matches a "." character.

]

)

( Capturing group #2. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

+ Quantifier. Match 1 or more of the preceding
token.

)

( Capturing group #3. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

3 Quantifier. Match 3 of the preceding token.

)

edited yesterday

Jeff Schaller♦

44k1161142

answered yesterday

Hermann

957515

1

... "0-9 Range. Matches a character in the range "0" to "9". Case sensitive"; By Case sensitive, you mean English digits? or?

– αғsнιη
yesterday

No, case sensitive means big and small letters make a difference. Regular expressions are usually case sensitive (unless configured otherwise e.g. an i switch). In this particular case, the character set only contains digits. They have big and small variants so the case sensitivity does not actually matter. This is the output as given by regexr – feel free to try it. The explanation tool is quite interactive.

– Hermann
yesterday

add a comment |

( Capturing group #1. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

^ Beginning. Matches the beginning of the
string.

| Alternation. Acts like a boolean OR. Matches
the expression before or after the |.

[^ Negated set. Match any character that is not
in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

. Character. Matches a "." character.

]

)

( Capturing group #2. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

+ Quantifier. Match 1 or more of the preceding
token.

)

( Capturing group #3. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

3 Quantifier. Match 3 of the preceding token.

)

edited yesterday

Jeff Schaller♦

44k1161142

answered yesterday

Hermann

957515

( Capturing group #1. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

^ Beginning. Matches the beginning of the
string.

| Alternation. Acts like a boolean OR. Matches
the expression before or after the |.

[^ Negated set. Match any character that is not
in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

. Character. Matches a "." character.

]

)

( Capturing group #2. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

+ Quantifier. Match 1 or more of the preceding
token.

)

( Capturing group #3. Groups multiple tokens
together and creates a capture group for extracting a substring or using
a backreference.

[ Character set. Match any character in the set.

0-9 Range. Matches a character in the range "0"
to "9". Case sensitive.

]

3 Quantifier. Match 3 of the preceding token.

)

edited yesterday

Jeff Schaller♦

44k1161142

answered yesterday

Hermann

957515

edited yesterday

Jeff Schaller♦

44k1161142

edited yesterday

Jeff Schaller♦

44k1161142

edited yesterday

Jeff Schaller♦

44k1161142

answered yesterday

Hermann

957515

answered yesterday

Hermann

957515

answered yesterday

Hermann

957515

1

... "0-9 Range. Matches a character in the range "0" to "9". Case sensitive"; By Case sensitive, you mean English digits? or?

– αғsнιη
yesterday

No, case sensitive means big and small letters make a difference. Regular expressions are usually case sensitive (unless configured otherwise e.g. an i switch). In this particular case, the character set only contains digits. They have big and small variants so the case sensitivity does not actually matter. This is the output as given by regexr – feel free to try it. The explanation tool is quite interactive.

– Hermann
yesterday

add a comment |

1

... "0-9 Range. Matches a character in the range "0" to "9". Case sensitive"; By Case sensitive, you mean English digits? or?

– αғsнιη
yesterday

No, case sensitive means big and small letters make a difference. Regular expressions are usually case sensitive (unless configured otherwise e.g. an i switch). In this particular case, the character set only contains digits. They have big and small variants so the case sensitivity does not actually matter. This is the output as given by regexr – feel free to try it. The explanation tool is quite interactive.

– Hermann
yesterday

... "0-9 Range. Matches a character in the range "0" to "9". Case sensitive"; By Case sensitive, you mean English digits? or?

– αғsнιη
yesterday

No, case sensitive means big and small letters make a difference. Regular expressions are usually case sensitive (unless configured otherwise e.g. an i switch). In this particular case, the character set only contains digits. They have big and small variants so the case sensitivity does not actually matter. This is the output as given by regexr – feel free to try it. The explanation tool is quite interactive.

– Hermann
yesterday

add a comment |

sed -E 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g'

sed -E -e :a -e 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' -e ta

answered yesterday

ilkkachu

62.8k10103180

add a comment |

sed -E 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g'

sed -E -e :a -e 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' -e ta

answered yesterday

ilkkachu

62.8k10103180

add a comment |

sed -E 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g'

sed -E -e :a -e 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' -e ta

answered yesterday

ilkkachu

62.8k10103180

sed -E 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g'

sed -E -e :a -e 's/(^|[^0-9.])([0-9]+)([0-9]3)/12,3/g' -e ta

answered yesterday

ilkkachu

62.8k10103180

answered yesterday

ilkkachu

62.8k10103180

answered yesterday

ilkkachu

62.8k10103180

answered yesterday

ilkkachu

62.8k10103180

add a comment |

sed 's/foo/bar/g' number.txt: This reads the file number.txt, and replaces the regex pattern foo with bar. This will occur for all matches on each line (/g).

(^|[^0-9.])([0-9]+)([0-9]3): This is the pattern to replace. Each part in the escaped parentheses (…) is a "capturing group". The pattern inside is "captured" for later use.
- (^|[^0-9.]): Find the beginning of the line ^ or | a character that is not a numeral or period [^0-9.]. This essentially finds the character preceding a number.
- ([0-9]+): Find one or more numerals [0-9]+.
- ([0-9]3): Find 3 numerals [0-9]3.

12,3: replace the above matches with the first two capturing groups followed by , then the last capturing group. In other words, insert , between the second and third patterns.

Because sed is "greedy", it will try to maximise the length of the match. Hence the final capturing group will be the last three numerals in the number.

edited yesterday

answered yesterday

Sparhawk

10.3k744101

add a comment |

sed 's/foo/bar/g' number.txt: This reads the file number.txt, and replaces the regex pattern foo with bar. This will occur for all matches on each line (/g).

(^|[^0-9.])([0-9]+)([0-9]3): This is the pattern to replace. Each part in the escaped parentheses (…) is a "capturing group". The pattern inside is "captured" for later use.
- (^|[^0-9.]): Find the beginning of the line ^ or | a character that is not a numeral or period [^0-9.]. This essentially finds the character preceding a number.
- ([0-9]+): Find one or more numerals [0-9]+.
- ([0-9]3): Find 3 numerals [0-9]3.

12,3: replace the above matches with the first two capturing groups followed by , then the last capturing group. In other words, insert , between the second and third patterns.

Because sed is "greedy", it will try to maximise the length of the match. Hence the final capturing group will be the last three numerals in the number.

edited yesterday

answered yesterday

Sparhawk

10.3k744101

add a comment |

sed 's/foo/bar/g' number.txt: This reads the file number.txt, and replaces the regex pattern foo with bar. This will occur for all matches on each line (/g).

(^|[^0-9.])([0-9]+)([0-9]3): This is the pattern to replace. Each part in the escaped parentheses (…) is a "capturing group". The pattern inside is "captured" for later use.
- (^|[^0-9.]): Find the beginning of the line ^ or | a character that is not a numeral or period [^0-9.]. This essentially finds the character preceding a number.
- ([0-9]+): Find one or more numerals [0-9]+.
- ([0-9]3): Find 3 numerals [0-9]3.

12,3: replace the above matches with the first two capturing groups followed by , then the last capturing group. In other words, insert , between the second and third patterns.

Because sed is "greedy", it will try to maximise the length of the match. Hence the final capturing group will be the last three numerals in the number.

edited yesterday

answered yesterday

Sparhawk

10.3k744101

sed 's/foo/bar/g' number.txt: This reads the file number.txt, and replaces the regex pattern foo with bar. This will occur for all matches on each line (/g).

(^|[^0-9.])([0-9]+)([0-9]3): This is the pattern to replace. Each part in the escaped parentheses (…) is a "capturing group". The pattern inside is "captured" for later use.
- (^|[^0-9.]): Find the beginning of the line ^ or | a character that is not a numeral or period [^0-9.]. This essentially finds the character preceding a number.
- ([0-9]+): Find one or more numerals [0-9]+.
- ([0-9]3): Find 3 numerals [0-9]3.

12,3: replace the above matches with the first two capturing groups followed by , then the last capturing group. In other words, insert , between the second and third patterns.

Because sed is "greedy", it will try to maximise the length of the match. Hence the final capturing group will be the last three numerals in the number.

edited yesterday

answered yesterday

Sparhawk

10.3k744101

edited yesterday

answered yesterday

Sparhawk

10.3k744101

answered yesterday

Sparhawk

10.3k744101

answered yesterday

Sparhawk

10.3k744101

add a comment |

Dhaval kale is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Dhaval kale is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Ygtjki

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

Àrd-bhaile Cathair chruinne/Baile mòr cruinne | Artagailean ceangailte | Clàr-taice na seòladaireachd

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Àrd-bhaile Cathair chruinne/Baile mòr cruinne | Artagailean ceangailte | Clàr-taice na seòladaireachd

3 Answers
3

3 Answers
3

3 Answers
3