concurrent processing in bash using process expansion, and redirection2019 Community Moderator ElectionRaces when piping two commands to a named pipeProcess substitution with input redirectionPropose additional file descriptor “stdmeta”Pipe Named FifoTwo input pipes through file descriptor shuffling and /dev/fdRedirection and piping for greppingWhat is the purpose of using a FIFO vs a temporary file or a pipe?How does a temporary file differs from a pipe?How can I stop ffmpeg from quitting when it reaches the end of a named pipe?Using exec 3> to keep a named pipe open
What is Cash Advance APR?
How to fade a semiplane defined by line?
Using substitution ciphers to generate new alphabets in a novel
Does IPv6 have similar concept of network mask?
How to explain what's wrong with this application of the chain rule?
Is there a way to get `mathscr' with lower case letters in pdfLaTeX?
Quoting Keynes in a lecture
Invalid date error by date command
Why can Carol Danvers change her suit colours in the first place?
Why would a new[] expression ever invoke a destructor?
Strong empirical falsification of quantum mechanics based on vacuum energy density
The IT department bottlenecks progress. How should I handle this?
Why is so much work done on numerical verification of the Riemann Hypothesis?
Can a College of Swords bard use a Blade Flourish option on an opportunity attack provoked by their own Dissonant Whispers spell?
Why does the Sun have different day lengths, but not the gas giants?
Are Captain Marvel's powers affected by Thanos' actions in Infinity War
Picking the different solutions to the time independent Schrodinger eqaution
Calculating total slots
Why "had" in "[something] we would have made had we used [something]"?
How to rewrite equation of hyperbola in standard form
Non-trope happy ending?
Why does a simple loop result in ASYNC_NETWORK_IO waits?
Recommended PCB layout understanding - ADM2572 datasheet
Unexpected behavior of the procedure `Area` on the object 'Polygon'
concurrent processing in bash using process expansion, and redirection
2019 Community Moderator ElectionRaces when piping two commands to a named pipeProcess substitution with input redirectionPropose additional file descriptor “stdmeta”Pipe Named FifoTwo input pipes through file descriptor shuffling and /dev/fdRedirection and piping for greppingWhat is the purpose of using a FIFO vs a temporary file or a pipe?How does a temporary file differs from a pipe?How can I stop ffmpeg from quitting when it reaches the end of a named pipe?Using exec 3> to keep a named pipe open
Apart from possible races that have to be controlled by using proper synchronisation, it's possible in bash to feed a data source concurrently into multiple pipelines and collect all their outputs later into a common data sink.
For example, if you wanted to pre-process header and body of an email separately by different processes before sending it, you could do this as follows:
cat email.txt
| sed -e '1,/^$/d' 3>&1
| sendmail -oi -- test@example.org
Given that, I was looking for a way to use the output of one of these pipelines to appear on the command line of one of the other pipelines or of the final data sink. The best I could achive so far was using a named pipe and xargs's -a option that allows for having two sources of input.
For example, to append -- automatically -- the number of lines in an email's body to the email's subject line, one could use:
cat email.txt
| sed -e '1,/^$/d'
3>&1
| xargs -I% -a ~/.fifo sed -e '1,/^$//^Subject:/Is/$/ (%)/'
| sendmail ...
(xargs -I% -a /dev/fd/4 4<~/.fifo ...
also works, cf. below.) In this example the file ~/.fifo
is a named pipe, created with mkfifo ~/.fifo
.
But when i try to do this without a named pipe by using only file descriptors and redirection, e.g. analogous to the 1st example,
cat email.txt
| tee >(sed -ne '1,/^$/p' >&3)
3>&1
| xargs -I% -a /dev/fd/4 sed -e '1,/^$//^Subject:/Is/$/ (%)/'
| sendmail ...
this only results in an error:
xargs: Cannot open input file ‘/dev/fd/4’: No such file or directory
bash: 4: Bad file descriptor
[Update: Replacing the -a /dev/fd/4
with -a <(cat <&4)
in the xargs
call doesn't work as well; the complaint about the non-existent /dev/fd/4
is just replaced by another Bad file descriptor
error. It seems to me that the fd 4 that is used for output (>&4
) is not connected to the fd 4 that is used for input (<&4
resp. /dev/fd/4
).]
Is there any way to get rid of the named pipe by some clever combination of redirection and process expansion? And, of course, without stating the data source more than once as in
nol="$(sed -e '1,/^$/d' email.txt | wc -l)"
sed -e "1,/^$//^Subject:/Is/$/ ($nol)/" email.txt | sendmail ...
bash io-redirection file-descriptors fifo process-substitution
add a comment |
Apart from possible races that have to be controlled by using proper synchronisation, it's possible in bash to feed a data source concurrently into multiple pipelines and collect all their outputs later into a common data sink.
For example, if you wanted to pre-process header and body of an email separately by different processes before sending it, you could do this as follows:
cat email.txt
| sed -e '1,/^$/d' 3>&1
| sendmail -oi -- test@example.org
Given that, I was looking for a way to use the output of one of these pipelines to appear on the command line of one of the other pipelines or of the final data sink. The best I could achive so far was using a named pipe and xargs's -a option that allows for having two sources of input.
For example, to append -- automatically -- the number of lines in an email's body to the email's subject line, one could use:
cat email.txt
| sed -e '1,/^$/d'
3>&1
| xargs -I% -a ~/.fifo sed -e '1,/^$//^Subject:/Is/$/ (%)/'
| sendmail ...
(xargs -I% -a /dev/fd/4 4<~/.fifo ...
also works, cf. below.) In this example the file ~/.fifo
is a named pipe, created with mkfifo ~/.fifo
.
But when i try to do this without a named pipe by using only file descriptors and redirection, e.g. analogous to the 1st example,
cat email.txt
| tee >(sed -ne '1,/^$/p' >&3)
3>&1
| xargs -I% -a /dev/fd/4 sed -e '1,/^$//^Subject:/Is/$/ (%)/'
| sendmail ...
this only results in an error:
xargs: Cannot open input file ‘/dev/fd/4’: No such file or directory
bash: 4: Bad file descriptor
[Update: Replacing the -a /dev/fd/4
with -a <(cat <&4)
in the xargs
call doesn't work as well; the complaint about the non-existent /dev/fd/4
is just replaced by another Bad file descriptor
error. It seems to me that the fd 4 that is used for output (>&4
) is not connected to the fd 4 that is used for input (<&4
resp. /dev/fd/4
).]
Is there any way to get rid of the named pipe by some clever combination of redirection and process expansion? And, of course, without stating the data source more than once as in
nol="$(sed -e '1,/^$/d' email.txt | wc -l)"
sed -e "1,/^$//^Subject:/Is/$/ ($nol)/" email.txt | sendmail ...
bash io-redirection file-descriptors fifo process-substitution
add a comment |
Apart from possible races that have to be controlled by using proper synchronisation, it's possible in bash to feed a data source concurrently into multiple pipelines and collect all their outputs later into a common data sink.
For example, if you wanted to pre-process header and body of an email separately by different processes before sending it, you could do this as follows:
cat email.txt
| sed -e '1,/^$/d' 3>&1
| sendmail -oi -- test@example.org
Given that, I was looking for a way to use the output of one of these pipelines to appear on the command line of one of the other pipelines or of the final data sink. The best I could achive so far was using a named pipe and xargs's -a option that allows for having two sources of input.
For example, to append -- automatically -- the number of lines in an email's body to the email's subject line, one could use:
cat email.txt
| sed -e '1,/^$/d'
3>&1
| xargs -I% -a ~/.fifo sed -e '1,/^$//^Subject:/Is/$/ (%)/'
| sendmail ...
(xargs -I% -a /dev/fd/4 4<~/.fifo ...
also works, cf. below.) In this example the file ~/.fifo
is a named pipe, created with mkfifo ~/.fifo
.
But when i try to do this without a named pipe by using only file descriptors and redirection, e.g. analogous to the 1st example,
cat email.txt
| tee >(sed -ne '1,/^$/p' >&3)
3>&1
| xargs -I% -a /dev/fd/4 sed -e '1,/^$//^Subject:/Is/$/ (%)/'
| sendmail ...
this only results in an error:
xargs: Cannot open input file ‘/dev/fd/4’: No such file or directory
bash: 4: Bad file descriptor
[Update: Replacing the -a /dev/fd/4
with -a <(cat <&4)
in the xargs
call doesn't work as well; the complaint about the non-existent /dev/fd/4
is just replaced by another Bad file descriptor
error. It seems to me that the fd 4 that is used for output (>&4
) is not connected to the fd 4 that is used for input (<&4
resp. /dev/fd/4
).]
Is there any way to get rid of the named pipe by some clever combination of redirection and process expansion? And, of course, without stating the data source more than once as in
nol="$(sed -e '1,/^$/d' email.txt | wc -l)"
sed -e "1,/^$//^Subject:/Is/$/ ($nol)/" email.txt | sendmail ...
bash io-redirection file-descriptors fifo process-substitution
Apart from possible races that have to be controlled by using proper synchronisation, it's possible in bash to feed a data source concurrently into multiple pipelines and collect all their outputs later into a common data sink.
For example, if you wanted to pre-process header and body of an email separately by different processes before sending it, you could do this as follows:
cat email.txt
| sed -e '1,/^$/d' 3>&1
| sendmail -oi -- test@example.org
Given that, I was looking for a way to use the output of one of these pipelines to appear on the command line of one of the other pipelines or of the final data sink. The best I could achive so far was using a named pipe and xargs's -a option that allows for having two sources of input.
For example, to append -- automatically -- the number of lines in an email's body to the email's subject line, one could use:
cat email.txt
| sed -e '1,/^$/d'
3>&1
| xargs -I% -a ~/.fifo sed -e '1,/^$//^Subject:/Is/$/ (%)/'
| sendmail ...
(xargs -I% -a /dev/fd/4 4<~/.fifo ...
also works, cf. below.) In this example the file ~/.fifo
is a named pipe, created with mkfifo ~/.fifo
.
But when i try to do this without a named pipe by using only file descriptors and redirection, e.g. analogous to the 1st example,
cat email.txt
| tee >(sed -ne '1,/^$/p' >&3)
3>&1
| xargs -I% -a /dev/fd/4 sed -e '1,/^$//^Subject:/Is/$/ (%)/'
| sendmail ...
this only results in an error:
xargs: Cannot open input file ‘/dev/fd/4’: No such file or directory
bash: 4: Bad file descriptor
[Update: Replacing the -a /dev/fd/4
with -a <(cat <&4)
in the xargs
call doesn't work as well; the complaint about the non-existent /dev/fd/4
is just replaced by another Bad file descriptor
error. It seems to me that the fd 4 that is used for output (>&4
) is not connected to the fd 4 that is used for input (<&4
resp. /dev/fd/4
).]
Is there any way to get rid of the named pipe by some clever combination of redirection and process expansion? And, of course, without stating the data source more than once as in
nol="$(sed -e '1,/^$/d' email.txt | wc -l)"
sed -e "1,/^$//^Subject:/Is/$/ ($nol)/" email.txt | sendmail ...
bash io-redirection file-descriptors fifo process-substitution
bash io-redirection file-descriptors fifo process-substitution
edited yesterday
serolmy
asked yesterday
serolmyserolmy
414
414
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The errors in your command are because fd 4 is not open at all.
In fact you receive two "bad file descriptor" messages, one from the wc -l
and the other from the cat <&4
(or the xargs -a /dev/fd/4
).
You’d need an unnamed pipe to open fd 4 onto, but the only official way to have unnamed pipes in Bash is actually through the coproc
command
The official way: coproc
There can be quite a few approaches to using coproc, however for your case I suppose the best one would be as follows:
cat email.txt | (coproc cat ; : input<&$COPROC[0] output>&$COPROC[1] ; tee >( sed -e '1,/^$/d' >&$output) | sed -e "1,/^$//^Subject:/Is/$/ ($(sed -ne '/^EOF$/q;p' <&$input))/" ; )
The above command line should yield the intended result as per your example case.
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
( # a subcommand statement, which ...
coproc cat ; # ... first spawns the coprocess, a simple cat command
: cp_output<&$COPROC[0]- cp_input>&$COPROC[1]- ; # then moves coproc own fds into new ones whose number are put into (arbitrary) variables $cp_output and $cp_input
tee # and then mirrors the data from main stdin to ...
>( # ... the side processing, which here has to be a compound statement that ...
wc -l; # ... first counts the body lines ...
echo EOF ; # ... then sends an (arbitrary) string for notifying end-of-data ...
>&$cp_input # ... to the coproc input
)
| # the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"1,/^$//^Subject:/Is/$/ (" # so to append the outcome of the coproc, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$cp_output)" # capturing the (arbitrary) EOF string to quit the reading from the coproc
")/" ;
)
A few additional notes:
- a subcommand statement is recommended so that no coproc’s data (ie process and fds) leaks to the interactive bash (assuming you run this beast interactively!)
- else the management of this coproc’s data is completely up to you, so you may need eg to close the fds explicitly by
exec cp_input<&-
orexec COPROC[1]<&-
- you can use any command with coproc but I always found that using a simple
cat
bridging the two fds makes a handy general purpose solution; however you can optimize towards performance if you manage to embed any one worker process into the coproc itself; in this example you’d need a lot of rearranging the whole command line - as per Bash v4 documentation, Bash supports only one coproc at a time
- however, at least on v4.3 onwards it does accept more coprocs, though with an explicit warning, and Bash v5 docs does not state any limit
- in case of more coprocs you want to use explicit names for each coproc (see the docs for details)
- moving/copying coproc’s fds to arbitrary fds is required for them to survive the pipelines and process substitutions used in this example as the
$COPROC[*]
array does not export to child processes and its own fds are always closed on exec - the use of an in-band EOF notification string is not strictly required, but I often found it hard to make other approaches synchronously correct
- the piece retrieving the side-band data is the
$(sed -ne '/^EOF$/q;p' <&$cp_output)
; here a process expansion is required because this data carries the EOF string to be intercepted, but if you manage to move that need away from the coproc you can then just read the$cp_output
fd directly as in eg yourxargs -a
command
Then there is also
The unofficial way: true unnamed pipes
This feature is still undocumented as of Bash v5, but works on at least v4.3 (couldn’t test v5 yet).
Unnamed pipes are obtainable using the <(:)
redirection syntax.
The same example with unnamed pipes boils down to the following:
cat email.txt | : pipe<> <(:) ; tee >( sed -e '1,/^$/d' >&$pipe)
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
# the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"/1,^$//^Subject:/Is/$/ (" # to append the outcome of $pipe fd, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$pipe)" # capturing the (arbitrary) EOF string to quit the reading from $pipe fd
")/" ;
Again a few additional notes:
- opening the unnamed pipe RW is required as I found no way to rather open the usual pair of pipes being one the read-end and the other its write-end
- this means there can’t be the usual EOF event notifying the read part that no more data will come, you have to do it your own in some other way and here I went again for an in-band EOF string, probably the simplest approach synchronous-wise
- like the coproc’s fds, the management of these unnamed pipes are completely up to you, so you may need to close them explicitly by
exec pipe<&-
; in this example I didn’t need to do it because the fds are created in a subprocess (the pipeline)
This solution makes for some less typing (maybe nicer for a command line) and certainly better performance compared to the coproc solution as here there’s no cat
command (nor anything else) bridging two fds, rather it’s really just a direct “loopback connection”. Also, I believe it makes for a syntactically smoother solution than coproc especially when you need more than one concurrent channel.
New contributor
This is another interesting solution for the specific use-case of the example with the need for a fifo being replaced by the need for a final sorting step. Unfortunately my real, more complicated use-case actually requires that the result of the 1st child process somehow get into the command line of the 2nd. (It's about reformatting a table: one process returns a list of the longest content for each column (e.g. "12:8:23:5") and the other process needs this as an option (--colwidth=12:8:23:5) on its commandline to do the work.)
– serolmy
17 hours ago
1
Oh I see, you’d like a sort of side-band channel. Then you may have a use of the (albeit undocumented) unnamed-pipe in bash. For your OP example you’d do something like this:cat email.txt | sed -e "/^Subject:/cSubject: $(sed -ne '/^EOF$/q;p' <&$pipe)" ;
. If this achieves the intended result I will update my post with a full explanation
– LL3
12 hours ago
Wow. This really weird looking stuff works perfectly (well almost, in the example the number of lines is appended, but anyway) on the 1st try (withecho -e "From: foonTo: barnSubject: blahnnbody1nbody2nbody3nnbody4"
instead ofcat email.txt
).
– serolmy
10 hours ago
Right you are! Of course! :D I’ll put the corrected (ie appending) sed in my Answer (though I’m sure you corrected it yourself already). Good!
– LL3
9 hours ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f507765%2fconcurrent-processing-in-bash-using-process-expansion-and-redirection%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The errors in your command are because fd 4 is not open at all.
In fact you receive two "bad file descriptor" messages, one from the wc -l
and the other from the cat <&4
(or the xargs -a /dev/fd/4
).
You’d need an unnamed pipe to open fd 4 onto, but the only official way to have unnamed pipes in Bash is actually through the coproc
command
The official way: coproc
There can be quite a few approaches to using coproc, however for your case I suppose the best one would be as follows:
cat email.txt | (coproc cat ; : input<&$COPROC[0] output>&$COPROC[1] ; tee >( sed -e '1,/^$/d' >&$output) | sed -e "1,/^$//^Subject:/Is/$/ ($(sed -ne '/^EOF$/q;p' <&$input))/" ; )
The above command line should yield the intended result as per your example case.
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
( # a subcommand statement, which ...
coproc cat ; # ... first spawns the coprocess, a simple cat command
: cp_output<&$COPROC[0]- cp_input>&$COPROC[1]- ; # then moves coproc own fds into new ones whose number are put into (arbitrary) variables $cp_output and $cp_input
tee # and then mirrors the data from main stdin to ...
>( # ... the side processing, which here has to be a compound statement that ...
wc -l; # ... first counts the body lines ...
echo EOF ; # ... then sends an (arbitrary) string for notifying end-of-data ...
>&$cp_input # ... to the coproc input
)
| # the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"1,/^$//^Subject:/Is/$/ (" # so to append the outcome of the coproc, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$cp_output)" # capturing the (arbitrary) EOF string to quit the reading from the coproc
")/" ;
)
A few additional notes:
- a subcommand statement is recommended so that no coproc’s data (ie process and fds) leaks to the interactive bash (assuming you run this beast interactively!)
- else the management of this coproc’s data is completely up to you, so you may need eg to close the fds explicitly by
exec cp_input<&-
orexec COPROC[1]<&-
- you can use any command with coproc but I always found that using a simple
cat
bridging the two fds makes a handy general purpose solution; however you can optimize towards performance if you manage to embed any one worker process into the coproc itself; in this example you’d need a lot of rearranging the whole command line - as per Bash v4 documentation, Bash supports only one coproc at a time
- however, at least on v4.3 onwards it does accept more coprocs, though with an explicit warning, and Bash v5 docs does not state any limit
- in case of more coprocs you want to use explicit names for each coproc (see the docs for details)
- moving/copying coproc’s fds to arbitrary fds is required for them to survive the pipelines and process substitutions used in this example as the
$COPROC[*]
array does not export to child processes and its own fds are always closed on exec - the use of an in-band EOF notification string is not strictly required, but I often found it hard to make other approaches synchronously correct
- the piece retrieving the side-band data is the
$(sed -ne '/^EOF$/q;p' <&$cp_output)
; here a process expansion is required because this data carries the EOF string to be intercepted, but if you manage to move that need away from the coproc you can then just read the$cp_output
fd directly as in eg yourxargs -a
command
Then there is also
The unofficial way: true unnamed pipes
This feature is still undocumented as of Bash v5, but works on at least v4.3 (couldn’t test v5 yet).
Unnamed pipes are obtainable using the <(:)
redirection syntax.
The same example with unnamed pipes boils down to the following:
cat email.txt | : pipe<> <(:) ; tee >( sed -e '1,/^$/d' >&$pipe)
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
# the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"/1,^$//^Subject:/Is/$/ (" # to append the outcome of $pipe fd, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$pipe)" # capturing the (arbitrary) EOF string to quit the reading from $pipe fd
")/" ;
Again a few additional notes:
- opening the unnamed pipe RW is required as I found no way to rather open the usual pair of pipes being one the read-end and the other its write-end
- this means there can’t be the usual EOF event notifying the read part that no more data will come, you have to do it your own in some other way and here I went again for an in-band EOF string, probably the simplest approach synchronous-wise
- like the coproc’s fds, the management of these unnamed pipes are completely up to you, so you may need to close them explicitly by
exec pipe<&-
; in this example I didn’t need to do it because the fds are created in a subprocess (the pipeline)
This solution makes for some less typing (maybe nicer for a command line) and certainly better performance compared to the coproc solution as here there’s no cat
command (nor anything else) bridging two fds, rather it’s really just a direct “loopback connection”. Also, I believe it makes for a syntactically smoother solution than coproc especially when you need more than one concurrent channel.
New contributor
This is another interesting solution for the specific use-case of the example with the need for a fifo being replaced by the need for a final sorting step. Unfortunately my real, more complicated use-case actually requires that the result of the 1st child process somehow get into the command line of the 2nd. (It's about reformatting a table: one process returns a list of the longest content for each column (e.g. "12:8:23:5") and the other process needs this as an option (--colwidth=12:8:23:5) on its commandline to do the work.)
– serolmy
17 hours ago
1
Oh I see, you’d like a sort of side-band channel. Then you may have a use of the (albeit undocumented) unnamed-pipe in bash. For your OP example you’d do something like this:cat email.txt | sed -e "/^Subject:/cSubject: $(sed -ne '/^EOF$/q;p' <&$pipe)" ;
. If this achieves the intended result I will update my post with a full explanation
– LL3
12 hours ago
Wow. This really weird looking stuff works perfectly (well almost, in the example the number of lines is appended, but anyway) on the 1st try (withecho -e "From: foonTo: barnSubject: blahnnbody1nbody2nbody3nnbody4"
instead ofcat email.txt
).
– serolmy
10 hours ago
Right you are! Of course! :D I’ll put the corrected (ie appending) sed in my Answer (though I’m sure you corrected it yourself already). Good!
– LL3
9 hours ago
add a comment |
The errors in your command are because fd 4 is not open at all.
In fact you receive two "bad file descriptor" messages, one from the wc -l
and the other from the cat <&4
(or the xargs -a /dev/fd/4
).
You’d need an unnamed pipe to open fd 4 onto, but the only official way to have unnamed pipes in Bash is actually through the coproc
command
The official way: coproc
There can be quite a few approaches to using coproc, however for your case I suppose the best one would be as follows:
cat email.txt | (coproc cat ; : input<&$COPROC[0] output>&$COPROC[1] ; tee >( sed -e '1,/^$/d' >&$output) | sed -e "1,/^$//^Subject:/Is/$/ ($(sed -ne '/^EOF$/q;p' <&$input))/" ; )
The above command line should yield the intended result as per your example case.
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
( # a subcommand statement, which ...
coproc cat ; # ... first spawns the coprocess, a simple cat command
: cp_output<&$COPROC[0]- cp_input>&$COPROC[1]- ; # then moves coproc own fds into new ones whose number are put into (arbitrary) variables $cp_output and $cp_input
tee # and then mirrors the data from main stdin to ...
>( # ... the side processing, which here has to be a compound statement that ...
wc -l; # ... first counts the body lines ...
echo EOF ; # ... then sends an (arbitrary) string for notifying end-of-data ...
>&$cp_input # ... to the coproc input
)
| # the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"1,/^$//^Subject:/Is/$/ (" # so to append the outcome of the coproc, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$cp_output)" # capturing the (arbitrary) EOF string to quit the reading from the coproc
")/" ;
)
A few additional notes:
- a subcommand statement is recommended so that no coproc’s data (ie process and fds) leaks to the interactive bash (assuming you run this beast interactively!)
- else the management of this coproc’s data is completely up to you, so you may need eg to close the fds explicitly by
exec cp_input<&-
orexec COPROC[1]<&-
- you can use any command with coproc but I always found that using a simple
cat
bridging the two fds makes a handy general purpose solution; however you can optimize towards performance if you manage to embed any one worker process into the coproc itself; in this example you’d need a lot of rearranging the whole command line - as per Bash v4 documentation, Bash supports only one coproc at a time
- however, at least on v4.3 onwards it does accept more coprocs, though with an explicit warning, and Bash v5 docs does not state any limit
- in case of more coprocs you want to use explicit names for each coproc (see the docs for details)
- moving/copying coproc’s fds to arbitrary fds is required for them to survive the pipelines and process substitutions used in this example as the
$COPROC[*]
array does not export to child processes and its own fds are always closed on exec - the use of an in-band EOF notification string is not strictly required, but I often found it hard to make other approaches synchronously correct
- the piece retrieving the side-band data is the
$(sed -ne '/^EOF$/q;p' <&$cp_output)
; here a process expansion is required because this data carries the EOF string to be intercepted, but if you manage to move that need away from the coproc you can then just read the$cp_output
fd directly as in eg yourxargs -a
command
Then there is also
The unofficial way: true unnamed pipes
This feature is still undocumented as of Bash v5, but works on at least v4.3 (couldn’t test v5 yet).
Unnamed pipes are obtainable using the <(:)
redirection syntax.
The same example with unnamed pipes boils down to the following:
cat email.txt | : pipe<> <(:) ; tee >( sed -e '1,/^$/d' >&$pipe)
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
# the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"/1,^$//^Subject:/Is/$/ (" # to append the outcome of $pipe fd, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$pipe)" # capturing the (arbitrary) EOF string to quit the reading from $pipe fd
")/" ;
Again a few additional notes:
- opening the unnamed pipe RW is required as I found no way to rather open the usual pair of pipes being one the read-end and the other its write-end
- this means there can’t be the usual EOF event notifying the read part that no more data will come, you have to do it your own in some other way and here I went again for an in-band EOF string, probably the simplest approach synchronous-wise
- like the coproc’s fds, the management of these unnamed pipes are completely up to you, so you may need to close them explicitly by
exec pipe<&-
; in this example I didn’t need to do it because the fds are created in a subprocess (the pipeline)
This solution makes for some less typing (maybe nicer for a command line) and certainly better performance compared to the coproc solution as here there’s no cat
command (nor anything else) bridging two fds, rather it’s really just a direct “loopback connection”. Also, I believe it makes for a syntactically smoother solution than coproc especially when you need more than one concurrent channel.
New contributor
This is another interesting solution for the specific use-case of the example with the need for a fifo being replaced by the need for a final sorting step. Unfortunately my real, more complicated use-case actually requires that the result of the 1st child process somehow get into the command line of the 2nd. (It's about reformatting a table: one process returns a list of the longest content for each column (e.g. "12:8:23:5") and the other process needs this as an option (--colwidth=12:8:23:5) on its commandline to do the work.)
– serolmy
17 hours ago
1
Oh I see, you’d like a sort of side-band channel. Then you may have a use of the (albeit undocumented) unnamed-pipe in bash. For your OP example you’d do something like this:cat email.txt | sed -e "/^Subject:/cSubject: $(sed -ne '/^EOF$/q;p' <&$pipe)" ;
. If this achieves the intended result I will update my post with a full explanation
– LL3
12 hours ago
Wow. This really weird looking stuff works perfectly (well almost, in the example the number of lines is appended, but anyway) on the 1st try (withecho -e "From: foonTo: barnSubject: blahnnbody1nbody2nbody3nnbody4"
instead ofcat email.txt
).
– serolmy
10 hours ago
Right you are! Of course! :D I’ll put the corrected (ie appending) sed in my Answer (though I’m sure you corrected it yourself already). Good!
– LL3
9 hours ago
add a comment |
The errors in your command are because fd 4 is not open at all.
In fact you receive two "bad file descriptor" messages, one from the wc -l
and the other from the cat <&4
(or the xargs -a /dev/fd/4
).
You’d need an unnamed pipe to open fd 4 onto, but the only official way to have unnamed pipes in Bash is actually through the coproc
command
The official way: coproc
There can be quite a few approaches to using coproc, however for your case I suppose the best one would be as follows:
cat email.txt | (coproc cat ; : input<&$COPROC[0] output>&$COPROC[1] ; tee >( sed -e '1,/^$/d' >&$output) | sed -e "1,/^$//^Subject:/Is/$/ ($(sed -ne '/^EOF$/q;p' <&$input))/" ; )
The above command line should yield the intended result as per your example case.
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
( # a subcommand statement, which ...
coproc cat ; # ... first spawns the coprocess, a simple cat command
: cp_output<&$COPROC[0]- cp_input>&$COPROC[1]- ; # then moves coproc own fds into new ones whose number are put into (arbitrary) variables $cp_output and $cp_input
tee # and then mirrors the data from main stdin to ...
>( # ... the side processing, which here has to be a compound statement that ...
wc -l; # ... first counts the body lines ...
echo EOF ; # ... then sends an (arbitrary) string for notifying end-of-data ...
>&$cp_input # ... to the coproc input
)
| # the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"1,/^$//^Subject:/Is/$/ (" # so to append the outcome of the coproc, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$cp_output)" # capturing the (arbitrary) EOF string to quit the reading from the coproc
")/" ;
)
A few additional notes:
- a subcommand statement is recommended so that no coproc’s data (ie process and fds) leaks to the interactive bash (assuming you run this beast interactively!)
- else the management of this coproc’s data is completely up to you, so you may need eg to close the fds explicitly by
exec cp_input<&-
orexec COPROC[1]<&-
- you can use any command with coproc but I always found that using a simple
cat
bridging the two fds makes a handy general purpose solution; however you can optimize towards performance if you manage to embed any one worker process into the coproc itself; in this example you’d need a lot of rearranging the whole command line - as per Bash v4 documentation, Bash supports only one coproc at a time
- however, at least on v4.3 onwards it does accept more coprocs, though with an explicit warning, and Bash v5 docs does not state any limit
- in case of more coprocs you want to use explicit names for each coproc (see the docs for details)
- moving/copying coproc’s fds to arbitrary fds is required for them to survive the pipelines and process substitutions used in this example as the
$COPROC[*]
array does not export to child processes and its own fds are always closed on exec - the use of an in-band EOF notification string is not strictly required, but I often found it hard to make other approaches synchronously correct
- the piece retrieving the side-band data is the
$(sed -ne '/^EOF$/q;p' <&$cp_output)
; here a process expansion is required because this data carries the EOF string to be intercepted, but if you manage to move that need away from the coproc you can then just read the$cp_output
fd directly as in eg yourxargs -a
command
Then there is also
The unofficial way: true unnamed pipes
This feature is still undocumented as of Bash v5, but works on at least v4.3 (couldn’t test v5 yet).
Unnamed pipes are obtainable using the <(:)
redirection syntax.
The same example with unnamed pipes boils down to the following:
cat email.txt | : pipe<> <(:) ; tee >( sed -e '1,/^$/d' >&$pipe)
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
# the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"/1,^$//^Subject:/Is/$/ (" # to append the outcome of $pipe fd, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$pipe)" # capturing the (arbitrary) EOF string to quit the reading from $pipe fd
")/" ;
Again a few additional notes:
- opening the unnamed pipe RW is required as I found no way to rather open the usual pair of pipes being one the read-end and the other its write-end
- this means there can’t be the usual EOF event notifying the read part that no more data will come, you have to do it your own in some other way and here I went again for an in-band EOF string, probably the simplest approach synchronous-wise
- like the coproc’s fds, the management of these unnamed pipes are completely up to you, so you may need to close them explicitly by
exec pipe<&-
; in this example I didn’t need to do it because the fds are created in a subprocess (the pipeline)
This solution makes for some less typing (maybe nicer for a command line) and certainly better performance compared to the coproc solution as here there’s no cat
command (nor anything else) bridging two fds, rather it’s really just a direct “loopback connection”. Also, I believe it makes for a syntactically smoother solution than coproc especially when you need more than one concurrent channel.
New contributor
The errors in your command are because fd 4 is not open at all.
In fact you receive two "bad file descriptor" messages, one from the wc -l
and the other from the cat <&4
(or the xargs -a /dev/fd/4
).
You’d need an unnamed pipe to open fd 4 onto, but the only official way to have unnamed pipes in Bash is actually through the coproc
command
The official way: coproc
There can be quite a few approaches to using coproc, however for your case I suppose the best one would be as follows:
cat email.txt | (coproc cat ; : input<&$COPROC[0] output>&$COPROC[1] ; tee >( sed -e '1,/^$/d' >&$output) | sed -e "1,/^$//^Subject:/Is/$/ ($(sed -ne '/^EOF$/q;p' <&$input))/" ; )
The above command line should yield the intended result as per your example case.
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
( # a subcommand statement, which ...
coproc cat ; # ... first spawns the coprocess, a simple cat command
: cp_output<&$COPROC[0]- cp_input>&$COPROC[1]- ; # then moves coproc own fds into new ones whose number are put into (arbitrary) variables $cp_output and $cp_input
tee # and then mirrors the data from main stdin to ...
>( # ... the side processing, which here has to be a compound statement that ...
wc -l; # ... first counts the body lines ...
echo EOF ; # ... then sends an (arbitrary) string for notifying end-of-data ...
>&$cp_input # ... to the coproc input
)
| # the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"1,/^$//^Subject:/Is/$/ (" # so to append the outcome of the coproc, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$cp_output)" # capturing the (arbitrary) EOF string to quit the reading from the coproc
")/" ;
)
A few additional notes:
- a subcommand statement is recommended so that no coproc’s data (ie process and fds) leaks to the interactive bash (assuming you run this beast interactively!)
- else the management of this coproc’s data is completely up to you, so you may need eg to close the fds explicitly by
exec cp_input<&-
orexec COPROC[1]<&-
- you can use any command with coproc but I always found that using a simple
cat
bridging the two fds makes a handy general purpose solution; however you can optimize towards performance if you manage to embed any one worker process into the coproc itself; in this example you’d need a lot of rearranging the whole command line - as per Bash v4 documentation, Bash supports only one coproc at a time
- however, at least on v4.3 onwards it does accept more coprocs, though with an explicit warning, and Bash v5 docs does not state any limit
- in case of more coprocs you want to use explicit names for each coproc (see the docs for details)
- moving/copying coproc’s fds to arbitrary fds is required for them to survive the pipelines and process substitutions used in this example as the
$COPROC[*]
array does not export to child processes and its own fds are always closed on exec - the use of an in-band EOF notification string is not strictly required, but I often found it hard to make other approaches synchronously correct
- the piece retrieving the side-band data is the
$(sed -ne '/^EOF$/q;p' <&$cp_output)
; here a process expansion is required because this data carries the EOF string to be intercepted, but if you manage to move that need away from the coproc you can then just read the$cp_output
fd directly as in eg yourxargs -a
command
Then there is also
The unofficial way: true unnamed pipes
This feature is still undocumented as of Bash v5, but works on at least v4.3 (couldn’t test v5 yet).
Unnamed pipes are obtainable using the <(:)
redirection syntax.
The same example with unnamed pipes boils down to the following:
cat email.txt | : pipe<> <(:) ; tee >( sed -e '1,/^$/d' >&$pipe)
Broken down for explanation: (only for clarity purposes, it cannot work when copied&pasted)
cat email.txt | # pipe data to ...
# the tee also pipes all main input to ...
sed -e # a sed command which looks for Subject: line in header part
"/1,^$//^Subject:/Is/$/ (" # to append the outcome of $pipe fd, (note the process expansion below), which needs ...
"$(sed -ne '/^EOF$/q;p' <&$pipe)" # capturing the (arbitrary) EOF string to quit the reading from $pipe fd
")/" ;
Again a few additional notes:
- opening the unnamed pipe RW is required as I found no way to rather open the usual pair of pipes being one the read-end and the other its write-end
- this means there can’t be the usual EOF event notifying the read part that no more data will come, you have to do it your own in some other way and here I went again for an in-band EOF string, probably the simplest approach synchronous-wise
- like the coproc’s fds, the management of these unnamed pipes are completely up to you, so you may need to close them explicitly by
exec pipe<&-
; in this example I didn’t need to do it because the fds are created in a subprocess (the pipeline)
This solution makes for some less typing (maybe nicer for a command line) and certainly better performance compared to the coproc solution as here there’s no cat
command (nor anything else) bridging two fds, rather it’s really just a direct “loopback connection”. Also, I believe it makes for a syntactically smoother solution than coproc especially when you need more than one concurrent channel.
New contributor
edited 3 hours ago
New contributor
answered yesterday
LL3LL3
863
863
New contributor
New contributor
This is another interesting solution for the specific use-case of the example with the need for a fifo being replaced by the need for a final sorting step. Unfortunately my real, more complicated use-case actually requires that the result of the 1st child process somehow get into the command line of the 2nd. (It's about reformatting a table: one process returns a list of the longest content for each column (e.g. "12:8:23:5") and the other process needs this as an option (--colwidth=12:8:23:5) on its commandline to do the work.)
– serolmy
17 hours ago
1
Oh I see, you’d like a sort of side-band channel. Then you may have a use of the (albeit undocumented) unnamed-pipe in bash. For your OP example you’d do something like this:cat email.txt | sed -e "/^Subject:/cSubject: $(sed -ne '/^EOF$/q;p' <&$pipe)" ;
. If this achieves the intended result I will update my post with a full explanation
– LL3
12 hours ago
Wow. This really weird looking stuff works perfectly (well almost, in the example the number of lines is appended, but anyway) on the 1st try (withecho -e "From: foonTo: barnSubject: blahnnbody1nbody2nbody3nnbody4"
instead ofcat email.txt
).
– serolmy
10 hours ago
Right you are! Of course! :D I’ll put the corrected (ie appending) sed in my Answer (though I’m sure you corrected it yourself already). Good!
– LL3
9 hours ago
add a comment |
This is another interesting solution for the specific use-case of the example with the need for a fifo being replaced by the need for a final sorting step. Unfortunately my real, more complicated use-case actually requires that the result of the 1st child process somehow get into the command line of the 2nd. (It's about reformatting a table: one process returns a list of the longest content for each column (e.g. "12:8:23:5") and the other process needs this as an option (--colwidth=12:8:23:5) on its commandline to do the work.)
– serolmy
17 hours ago
1
Oh I see, you’d like a sort of side-band channel. Then you may have a use of the (albeit undocumented) unnamed-pipe in bash. For your OP example you’d do something like this:cat email.txt | sed -e "/^Subject:/cSubject: $(sed -ne '/^EOF$/q;p' <&$pipe)" ;
. If this achieves the intended result I will update my post with a full explanation
– LL3
12 hours ago
Wow. This really weird looking stuff works perfectly (well almost, in the example the number of lines is appended, but anyway) on the 1st try (withecho -e "From: foonTo: barnSubject: blahnnbody1nbody2nbody3nnbody4"
instead ofcat email.txt
).
– serolmy
10 hours ago
Right you are! Of course! :D I’ll put the corrected (ie appending) sed in my Answer (though I’m sure you corrected it yourself already). Good!
– LL3
9 hours ago
This is another interesting solution for the specific use-case of the example with the need for a fifo being replaced by the need for a final sorting step. Unfortunately my real, more complicated use-case actually requires that the result of the 1st child process somehow get into the command line of the 2nd. (It's about reformatting a table: one process returns a list of the longest content for each column (e.g. "12:8:23:5") and the other process needs this as an option (--colwidth=12:8:23:5) on its commandline to do the work.)
– serolmy
17 hours ago
This is another interesting solution for the specific use-case of the example with the need for a fifo being replaced by the need for a final sorting step. Unfortunately my real, more complicated use-case actually requires that the result of the 1st child process somehow get into the command line of the 2nd. (It's about reformatting a table: one process returns a list of the longest content for each column (e.g. "12:8:23:5") and the other process needs this as an option (--colwidth=12:8:23:5) on its commandline to do the work.)
– serolmy
17 hours ago
1
1
Oh I see, you’d like a sort of side-band channel. Then you may have a use of the (albeit undocumented) unnamed-pipe in bash. For your OP example you’d do something like this:
cat email.txt | sed -e "/^Subject:/cSubject: $(sed -ne '/^EOF$/q;p' <&$pipe)" ;
. If this achieves the intended result I will update my post with a full explanation– LL3
12 hours ago
Oh I see, you’d like a sort of side-band channel. Then you may have a use of the (albeit undocumented) unnamed-pipe in bash. For your OP example you’d do something like this:
cat email.txt | sed -e "/^Subject:/cSubject: $(sed -ne '/^EOF$/q;p' <&$pipe)" ;
. If this achieves the intended result I will update my post with a full explanation– LL3
12 hours ago
Wow. This really weird looking stuff works perfectly (well almost, in the example the number of lines is appended, but anyway) on the 1st try (with
echo -e "From: foonTo: barnSubject: blahnnbody1nbody2nbody3nnbody4"
instead of cat email.txt
).– serolmy
10 hours ago
Wow. This really weird looking stuff works perfectly (well almost, in the example the number of lines is appended, but anyway) on the 1st try (with
echo -e "From: foonTo: barnSubject: blahnnbody1nbody2nbody3nnbody4"
instead of cat email.txt
).– serolmy
10 hours ago
Right you are! Of course! :D I’ll put the corrected (ie appending) sed in my Answer (though I’m sure you corrected it yourself already). Good!
– LL3
9 hours ago
Right you are! Of course! :D I’ll put the corrected (ie appending) sed in my Answer (though I’m sure you corrected it yourself already). Good!
– LL3
9 hours ago
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f507765%2fconcurrent-processing-in-bash-using-process-expansion-and-redirection%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown