How to convert an emoticon specified by a U+xxxxx code to utf-8?2019 Community Moderator ElectionBash script to get ASCII values for alphabetHow can I find the common name for a particular glyph?How to convert to HTML code?How can I set VIM's default encoding to UTF-8?How to replace all percent-encoded UTF-8 substrings with plain UTF-8 text?Convert ASCII-8BIT to UTF-8 using OSX' .bash_profilesupport for utf-8 encoding with lprCan not use `cut -c` (`--characters`) with UTF-8?How to convert unknown-8bit file to utf8Convert an ASCII file with octal escapes for UTF-8 codes to UTF-8Curl JSON encoded in UTF-8How to only keep BMP in the utf-8 text file?

How to interpret the phrase "t’en a fait voir à toi"?

Reply ‘no position’ while the job posting is still there (‘HiWi’ position in Germany)

Simple recursive Sudoku solver

Giant Toughroad SLR 2 for 200 miles in two days, will it make it?

Teaching indefinite integrals that require special-casing

Latex for-and in equation

Resetting two CD4017 counters simultaneously, only one resets

Installing PowerShell on 32-bit Kali OS fails

How can I successfully establish a nationwide combat training program for a large country?

Can a controlled ghast be a leader of a pack of ghouls?

How to check participants in at events?

Adding empty element to declared container without declaring type of element

Is infinity mathematically observable?

Stereotypical names

Partial sums of primes

The One-Electron Universe postulate is true - what simple change can I make to change the whole universe?

What was required to accept "troll"?

Lifted its hind leg on or lifted its hind leg towards?

What do you call the infoboxes with text and sometimes images on the side of a page we find in textbooks?

Can the harmonic series explain the origin of the major scale?

What is the term when two people sing in harmony, but they aren't singing the same notes?

Visiting the UK as unmarried couple

In Star Trek IV, why did the Bounty go back to a time when whales were already rare?

Freedom of speech and where it applies



How to convert an emoticon specified by a U+xxxxx code to utf-8?



2019 Community Moderator ElectionBash script to get ASCII values for alphabetHow can I find the common name for a particular glyph?How to convert to HTML code?How can I set VIM's default encoding to UTF-8?How to replace all percent-encoded UTF-8 substrings with plain UTF-8 text?Convert ASCII-8BIT to UTF-8 using OSX' .bash_profilesupport for utf-8 encoding with lprCan not use `cut -c` (`--characters`) with UTF-8?How to convert unknown-8bit file to utf8Convert an ASCII file with octal escapes for UTF-8 codes to UTF-8Curl JSON encoded in UTF-8How to only keep BMP in the utf-8 text file?










16















Emoticons seem to be specified using a format of U+xxxxx

wherein each x is a hexadecimal digit.



For example, U+1F615 is the official Unicode Consortium code for the "confused face" 😕



As I am often confused, I have a strong affinity for this symbol.



The U+1F615 representation is confusing to me because I thought the only encodings possible for unicode characters required 8, 16, 24 or 32 bits, whereas 5 hex digits require 5x4=20 bits.



I've discovered that this symbol seems to be represented by a completely different hex string in bash:



$echo -n 😕 | hexdump
0000000 f0 9f 98 95
0000004

$echo -e "xf0x9fx98x95"
😕

$PS1=$'xf0x9fx98x95 >'
😕 >


I would have expected U+1F615 to convert to something like x00 x01 xF6 x15.



I don't see the relationship between these 2 encodings?



When I lookup a symbol in the official Unicode Consortium list, I would like to be able to use that code directly without having to manually convert it in this tedious fashion. i.e.



  • finding the symbol on some web page

  • copying it to the clipboard of the web browser

  • pasting it in bash to echo through a hexdump to discover the REAL code.

Can I use this 20-bit code to determine what the 32-bit code is?



Does a relationship exist between these 2 numbers?










share|improve this question




























    16















    Emoticons seem to be specified using a format of U+xxxxx

    wherein each x is a hexadecimal digit.



    For example, U+1F615 is the official Unicode Consortium code for the "confused face" 😕



    As I am often confused, I have a strong affinity for this symbol.



    The U+1F615 representation is confusing to me because I thought the only encodings possible for unicode characters required 8, 16, 24 or 32 bits, whereas 5 hex digits require 5x4=20 bits.



    I've discovered that this symbol seems to be represented by a completely different hex string in bash:



    $echo -n 😕 | hexdump
    0000000 f0 9f 98 95
    0000004

    $echo -e "xf0x9fx98x95"
    😕

    $PS1=$'xf0x9fx98x95 >'
    😕 >


    I would have expected U+1F615 to convert to something like x00 x01 xF6 x15.



    I don't see the relationship between these 2 encodings?



    When I lookup a symbol in the official Unicode Consortium list, I would like to be able to use that code directly without having to manually convert it in this tedious fashion. i.e.



    • finding the symbol on some web page

    • copying it to the clipboard of the web browser

    • pasting it in bash to echo through a hexdump to discover the REAL code.

    Can I use this 20-bit code to determine what the 32-bit code is?



    Does a relationship exist between these 2 numbers?










    share|improve this question


























      16












      16








      16


      7






      Emoticons seem to be specified using a format of U+xxxxx

      wherein each x is a hexadecimal digit.



      For example, U+1F615 is the official Unicode Consortium code for the "confused face" 😕



      As I am often confused, I have a strong affinity for this symbol.



      The U+1F615 representation is confusing to me because I thought the only encodings possible for unicode characters required 8, 16, 24 or 32 bits, whereas 5 hex digits require 5x4=20 bits.



      I've discovered that this symbol seems to be represented by a completely different hex string in bash:



      $echo -n 😕 | hexdump
      0000000 f0 9f 98 95
      0000004

      $echo -e "xf0x9fx98x95"
      😕

      $PS1=$'xf0x9fx98x95 >'
      😕 >


      I would have expected U+1F615 to convert to something like x00 x01 xF6 x15.



      I don't see the relationship between these 2 encodings?



      When I lookup a symbol in the official Unicode Consortium list, I would like to be able to use that code directly without having to manually convert it in this tedious fashion. i.e.



      • finding the symbol on some web page

      • copying it to the clipboard of the web browser

      • pasting it in bash to echo through a hexdump to discover the REAL code.

      Can I use this 20-bit code to determine what the 32-bit code is?



      Does a relationship exist between these 2 numbers?










      share|improve this question
















      Emoticons seem to be specified using a format of U+xxxxx

      wherein each x is a hexadecimal digit.



      For example, U+1F615 is the official Unicode Consortium code for the "confused face" 😕



      As I am often confused, I have a strong affinity for this symbol.



      The U+1F615 representation is confusing to me because I thought the only encodings possible for unicode characters required 8, 16, 24 or 32 bits, whereas 5 hex digits require 5x4=20 bits.



      I've discovered that this symbol seems to be represented by a completely different hex string in bash:



      $echo -n 😕 | hexdump
      0000000 f0 9f 98 95
      0000004

      $echo -e "xf0x9fx98x95"
      😕

      $PS1=$'xf0x9fx98x95 >'
      😕 >


      I would have expected U+1F615 to convert to something like x00 x01 xF6 x15.



      I don't see the relationship between these 2 encodings?



      When I lookup a symbol in the official Unicode Consortium list, I would like to be able to use that code directly without having to manually convert it in this tedious fashion. i.e.



      • finding the symbol on some web page

      • copying it to the clipboard of the web browser

      • pasting it in bash to echo through a hexdump to discover the REAL code.

      Can I use this 20-bit code to determine what the 32-bit code is?



      Does a relationship exist between these 2 numbers?







      shell character-encoding unicode






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 19 '16 at 12:59









      tarleb

      1,517619




      1,517619










      asked Dec 30 '15 at 6:33









      Alex RyanAlex Ryan

      19115




      19115




















          3 Answers
          3






          active

          oldest

          votes


















          20














          UTF-8 is a variable length encoding of Unicode. It is designed to be superset of ASCII. See Wikipedia for details of the encoding. x00 x01 xF6 x15 would be UCS-4BE or UTF-32BE encoding.



          To get from the Unicode code point to the UTF-8 encoding, assuming the locale's charmap is UTF-8 (see the output of locale charmap), it's just:



          $ printf 'U1F615n'
          😕
          $ echo -e 'U1F615'
          😕
          $ confused_face=$'U1F615'


          The latter will be in the next version of the POSIX standard.



          AFAIK, that syntax was introduced in 2000 by the stand-alone GNU printf utility (as opposed to the printf utility of the GNU shell), brought to echo/printf/$'...' builtins first by zsh in 2003, ksh93 in 2004, bash in 2010, but was obviously inspired by other languages.



          ksh93 also supports it as printf 'x1f615n' and printf 'u1f615n'.



          $'uXXXX' and $'UXXXXXXXX' are supported by zsh, bash, ksh93, mksh and FreeBSD sh, GNU printf, GNU echo.



          Some require all the digits (as in U0001F615 as opposed to U1F615) though that's likely to change in future versions as POSIX will allow fewer digits. In any case, you need all the digits if the UXXXXXXXX is to be followed by hexadecimal digits as in U0001F615FOX, as U1F615FOX would have been $'U001F615F'OX.



          Some expand to the characters in the current locale's encoding at the time the string is parsed or at the time it is expanded, some only in UTF-8 regardless of the locale. If the character is not available in the current locale's encoding, the behaviour varies between shells.



          So, for best portability, best is to only use it in UTF-8 locales and use all the digits, and use it in $'...':



          printf '%sn' $'U0001F615'


          Note that:



          LC_ALL=C.UTF-8; printf '%sn' $'U0001F615'


          or:




          LC_ALL=C.UTF-8
          printf '%sn' $'U0001F615'



          Will not work with all shells (including bash) because the $'U0001F615' is parsed before LC_ALL is assigned. (also note that there's no guarantee that a system will have a locale called C.UTF-8)



          You'd need:



          LC_ALL=C.UTF-8; eval "confused_face=$'U0001F615'"


          Or:



          LC_ALL=C.UTF-8
          printf '%sn' $'U0001F615'


          (not within a compound command or function).




          For the reverse, to get from the UTF-8 encoding to the Unicode code-point, see this other question or that one.



          $ unicode 😕 
          U+1F615 CONFUSED FACE
          UTF-8: f0 9f 98 95 UTF-16BE: d83dde15 Decimal: 😕
          😕
          Category: So (Symbol, Other)
          Bidi: ON (Other Neutrals)

          $ perl -CA -le 'printf "%xn", ord shift' 😕
          1f615





          share|improve this answer




















          • 2





            Notice that if U1F615 is followed by another valid hexadecimal digit then that will be assumed to be part of the escape sequence. To make it work regardless of what it is followed by it has to have enough leading zeros to be exactly eight digits long: U0001F615

            – kasperd
            Dec 30 '15 at 9:18











          • @kasperd, thanks. Yes, it's worth noting. I've included that in the answer.

            – Stéphane Chazelas
            Dec 30 '15 at 9:49



















          7














          Here's a way to convert from UTF-32 (big endian) to UTF-8



          $ confused=$(echo -ne "x0x01xF6x15" | iconv -f UTF-32BE -t UTF-8) 
          $ echo $confused
          😕


          You'll notice your hex value 0x01F615 in there, padded with an extra leading 0 to fill 32 bits.



          The Wikipedia page on UTF-8 explains the transformation from a Unicode codepoint to its UTF-8 representation very clearly. But trying to do it yourself in shell scripting might not be the best idea.



          UTF-32 is fixed-width, and the correspondence between codepoint and UTF-32 representation is trivial - the value is the same.






          share|improve this answer






























            6














            Nice way to do it in your head or on paper:



            1. Figure out how many bytes it will be: values under U+0080 are one byte, else under U+0800 are 2 bytes, else under U+10000 are 3 bytes, else 4 bytes. In your case, 4 bytes.


            2. Convert hex to octal: 0373025.


            3. Starting at the end, peel off 2 octal digits at a time to get a sequence of octal values: 037 030 025.


            4. If you have fewer octal values than the expected number of bytes, add an extra 0 at the beginning: 000 037 030 025.


            5. For all but the first, add on 0200 to get: 000 0237 0230 0225.


            6. For the first, add 0300 if the expected length is 2, 0340 if it's 3, or 0360 if it's 4, to get: 360 0237 0230 0225.


            Now write as a string of octal escapes: 360237230225. Optionally convert back to hex if you want.






            share|improve this answer
























              Your Answer








              StackExchange.ready(function()
              var channelOptions =
              tags: "".split(" "),
              id: "106"
              ;
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function()
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled)
              StackExchange.using("snippets", function()
              createEditor();
              );

              else
              createEditor();

              );

              function createEditor()
              StackExchange.prepareEditor(
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: false,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: null,
              bindNavPrevention: true,
              postfix: "",
              imageUploader:
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              ,
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              );



              );













              draft saved

              draft discarded


















              StackExchange.ready(
              function ()
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f252286%2fhow-to-convert-an-emoticon-specified-by-a-uxxxxx-code-to-utf-8%23new-answer', 'question_page');

              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              20














              UTF-8 is a variable length encoding of Unicode. It is designed to be superset of ASCII. See Wikipedia for details of the encoding. x00 x01 xF6 x15 would be UCS-4BE or UTF-32BE encoding.



              To get from the Unicode code point to the UTF-8 encoding, assuming the locale's charmap is UTF-8 (see the output of locale charmap), it's just:



              $ printf 'U1F615n'
              😕
              $ echo -e 'U1F615'
              😕
              $ confused_face=$'U1F615'


              The latter will be in the next version of the POSIX standard.



              AFAIK, that syntax was introduced in 2000 by the stand-alone GNU printf utility (as opposed to the printf utility of the GNU shell), brought to echo/printf/$'...' builtins first by zsh in 2003, ksh93 in 2004, bash in 2010, but was obviously inspired by other languages.



              ksh93 also supports it as printf 'x1f615n' and printf 'u1f615n'.



              $'uXXXX' and $'UXXXXXXXX' are supported by zsh, bash, ksh93, mksh and FreeBSD sh, GNU printf, GNU echo.



              Some require all the digits (as in U0001F615 as opposed to U1F615) though that's likely to change in future versions as POSIX will allow fewer digits. In any case, you need all the digits if the UXXXXXXXX is to be followed by hexadecimal digits as in U0001F615FOX, as U1F615FOX would have been $'U001F615F'OX.



              Some expand to the characters in the current locale's encoding at the time the string is parsed or at the time it is expanded, some only in UTF-8 regardless of the locale. If the character is not available in the current locale's encoding, the behaviour varies between shells.



              So, for best portability, best is to only use it in UTF-8 locales and use all the digits, and use it in $'...':



              printf '%sn' $'U0001F615'


              Note that:



              LC_ALL=C.UTF-8; printf '%sn' $'U0001F615'


              or:




              LC_ALL=C.UTF-8
              printf '%sn' $'U0001F615'



              Will not work with all shells (including bash) because the $'U0001F615' is parsed before LC_ALL is assigned. (also note that there's no guarantee that a system will have a locale called C.UTF-8)



              You'd need:



              LC_ALL=C.UTF-8; eval "confused_face=$'U0001F615'"


              Or:



              LC_ALL=C.UTF-8
              printf '%sn' $'U0001F615'


              (not within a compound command or function).




              For the reverse, to get from the UTF-8 encoding to the Unicode code-point, see this other question or that one.



              $ unicode 😕 
              U+1F615 CONFUSED FACE
              UTF-8: f0 9f 98 95 UTF-16BE: d83dde15 Decimal: 😕
              😕
              Category: So (Symbol, Other)
              Bidi: ON (Other Neutrals)

              $ perl -CA -le 'printf "%xn", ord shift' 😕
              1f615





              share|improve this answer




















              • 2





                Notice that if U1F615 is followed by another valid hexadecimal digit then that will be assumed to be part of the escape sequence. To make it work regardless of what it is followed by it has to have enough leading zeros to be exactly eight digits long: U0001F615

                – kasperd
                Dec 30 '15 at 9:18











              • @kasperd, thanks. Yes, it's worth noting. I've included that in the answer.

                – Stéphane Chazelas
                Dec 30 '15 at 9:49
















              20














              UTF-8 is a variable length encoding of Unicode. It is designed to be superset of ASCII. See Wikipedia for details of the encoding. x00 x01 xF6 x15 would be UCS-4BE or UTF-32BE encoding.



              To get from the Unicode code point to the UTF-8 encoding, assuming the locale's charmap is UTF-8 (see the output of locale charmap), it's just:



              $ printf 'U1F615n'
              😕
              $ echo -e 'U1F615'
              😕
              $ confused_face=$'U1F615'


              The latter will be in the next version of the POSIX standard.



              AFAIK, that syntax was introduced in 2000 by the stand-alone GNU printf utility (as opposed to the printf utility of the GNU shell), brought to echo/printf/$'...' builtins first by zsh in 2003, ksh93 in 2004, bash in 2010, but was obviously inspired by other languages.



              ksh93 also supports it as printf 'x1f615n' and printf 'u1f615n'.



              $'uXXXX' and $'UXXXXXXXX' are supported by zsh, bash, ksh93, mksh and FreeBSD sh, GNU printf, GNU echo.



              Some require all the digits (as in U0001F615 as opposed to U1F615) though that's likely to change in future versions as POSIX will allow fewer digits. In any case, you need all the digits if the UXXXXXXXX is to be followed by hexadecimal digits as in U0001F615FOX, as U1F615FOX would have been $'U001F615F'OX.



              Some expand to the characters in the current locale's encoding at the time the string is parsed or at the time it is expanded, some only in UTF-8 regardless of the locale. If the character is not available in the current locale's encoding, the behaviour varies between shells.



              So, for best portability, best is to only use it in UTF-8 locales and use all the digits, and use it in $'...':



              printf '%sn' $'U0001F615'


              Note that:



              LC_ALL=C.UTF-8; printf '%sn' $'U0001F615'


              or:




              LC_ALL=C.UTF-8
              printf '%sn' $'U0001F615'



              Will not work with all shells (including bash) because the $'U0001F615' is parsed before LC_ALL is assigned. (also note that there's no guarantee that a system will have a locale called C.UTF-8)



              You'd need:



              LC_ALL=C.UTF-8; eval "confused_face=$'U0001F615'"


              Or:



              LC_ALL=C.UTF-8
              printf '%sn' $'U0001F615'


              (not within a compound command or function).




              For the reverse, to get from the UTF-8 encoding to the Unicode code-point, see this other question or that one.



              $ unicode 😕 
              U+1F615 CONFUSED FACE
              UTF-8: f0 9f 98 95 UTF-16BE: d83dde15 Decimal: 😕
              😕
              Category: So (Symbol, Other)
              Bidi: ON (Other Neutrals)

              $ perl -CA -le 'printf "%xn", ord shift' 😕
              1f615





              share|improve this answer




















              • 2





                Notice that if U1F615 is followed by another valid hexadecimal digit then that will be assumed to be part of the escape sequence. To make it work regardless of what it is followed by it has to have enough leading zeros to be exactly eight digits long: U0001F615

                – kasperd
                Dec 30 '15 at 9:18











              • @kasperd, thanks. Yes, it's worth noting. I've included that in the answer.

                – Stéphane Chazelas
                Dec 30 '15 at 9:49














              20












              20








              20







              UTF-8 is a variable length encoding of Unicode. It is designed to be superset of ASCII. See Wikipedia for details of the encoding. x00 x01 xF6 x15 would be UCS-4BE or UTF-32BE encoding.



              To get from the Unicode code point to the UTF-8 encoding, assuming the locale's charmap is UTF-8 (see the output of locale charmap), it's just:



              $ printf 'U1F615n'
              😕
              $ echo -e 'U1F615'
              😕
              $ confused_face=$'U1F615'


              The latter will be in the next version of the POSIX standard.



              AFAIK, that syntax was introduced in 2000 by the stand-alone GNU printf utility (as opposed to the printf utility of the GNU shell), brought to echo/printf/$'...' builtins first by zsh in 2003, ksh93 in 2004, bash in 2010, but was obviously inspired by other languages.



              ksh93 also supports it as printf 'x1f615n' and printf 'u1f615n'.



              $'uXXXX' and $'UXXXXXXXX' are supported by zsh, bash, ksh93, mksh and FreeBSD sh, GNU printf, GNU echo.



              Some require all the digits (as in U0001F615 as opposed to U1F615) though that's likely to change in future versions as POSIX will allow fewer digits. In any case, you need all the digits if the UXXXXXXXX is to be followed by hexadecimal digits as in U0001F615FOX, as U1F615FOX would have been $'U001F615F'OX.



              Some expand to the characters in the current locale's encoding at the time the string is parsed or at the time it is expanded, some only in UTF-8 regardless of the locale. If the character is not available in the current locale's encoding, the behaviour varies between shells.



              So, for best portability, best is to only use it in UTF-8 locales and use all the digits, and use it in $'...':



              printf '%sn' $'U0001F615'


              Note that:



              LC_ALL=C.UTF-8; printf '%sn' $'U0001F615'


              or:




              LC_ALL=C.UTF-8
              printf '%sn' $'U0001F615'



              Will not work with all shells (including bash) because the $'U0001F615' is parsed before LC_ALL is assigned. (also note that there's no guarantee that a system will have a locale called C.UTF-8)



              You'd need:



              LC_ALL=C.UTF-8; eval "confused_face=$'U0001F615'"


              Or:



              LC_ALL=C.UTF-8
              printf '%sn' $'U0001F615'


              (not within a compound command or function).




              For the reverse, to get from the UTF-8 encoding to the Unicode code-point, see this other question or that one.



              $ unicode 😕 
              U+1F615 CONFUSED FACE
              UTF-8: f0 9f 98 95 UTF-16BE: d83dde15 Decimal: 😕
              😕
              Category: So (Symbol, Other)
              Bidi: ON (Other Neutrals)

              $ perl -CA -le 'printf "%xn", ord shift' 😕
              1f615





              share|improve this answer















              UTF-8 is a variable length encoding of Unicode. It is designed to be superset of ASCII. See Wikipedia for details of the encoding. x00 x01 xF6 x15 would be UCS-4BE or UTF-32BE encoding.



              To get from the Unicode code point to the UTF-8 encoding, assuming the locale's charmap is UTF-8 (see the output of locale charmap), it's just:



              $ printf 'U1F615n'
              😕
              $ echo -e 'U1F615'
              😕
              $ confused_face=$'U1F615'


              The latter will be in the next version of the POSIX standard.



              AFAIK, that syntax was introduced in 2000 by the stand-alone GNU printf utility (as opposed to the printf utility of the GNU shell), brought to echo/printf/$'...' builtins first by zsh in 2003, ksh93 in 2004, bash in 2010, but was obviously inspired by other languages.



              ksh93 also supports it as printf 'x1f615n' and printf 'u1f615n'.



              $'uXXXX' and $'UXXXXXXXX' are supported by zsh, bash, ksh93, mksh and FreeBSD sh, GNU printf, GNU echo.



              Some require all the digits (as in U0001F615 as opposed to U1F615) though that's likely to change in future versions as POSIX will allow fewer digits. In any case, you need all the digits if the UXXXXXXXX is to be followed by hexadecimal digits as in U0001F615FOX, as U1F615FOX would have been $'U001F615F'OX.



              Some expand to the characters in the current locale's encoding at the time the string is parsed or at the time it is expanded, some only in UTF-8 regardless of the locale. If the character is not available in the current locale's encoding, the behaviour varies between shells.



              So, for best portability, best is to only use it in UTF-8 locales and use all the digits, and use it in $'...':



              printf '%sn' $'U0001F615'


              Note that:



              LC_ALL=C.UTF-8; printf '%sn' $'U0001F615'


              or:




              LC_ALL=C.UTF-8
              printf '%sn' $'U0001F615'



              Will not work with all shells (including bash) because the $'U0001F615' is parsed before LC_ALL is assigned. (also note that there's no guarantee that a system will have a locale called C.UTF-8)



              You'd need:



              LC_ALL=C.UTF-8; eval "confused_face=$'U0001F615'"


              Or:



              LC_ALL=C.UTF-8
              printf '%sn' $'U0001F615'


              (not within a compound command or function).




              For the reverse, to get from the UTF-8 encoding to the Unicode code-point, see this other question or that one.



              $ unicode 😕 
              U+1F615 CONFUSED FACE
              UTF-8: f0 9f 98 95 UTF-16BE: d83dde15 Decimal: 😕
              😕
              Category: So (Symbol, Other)
              Bidi: ON (Other Neutrals)

              $ perl -CA -le 'printf "%xn", ord shift' 😕
              1f615






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited yesterday

























              answered Dec 30 '15 at 7:59









              Stéphane ChazelasStéphane Chazelas

              311k57587945




              311k57587945







              • 2





                Notice that if U1F615 is followed by another valid hexadecimal digit then that will be assumed to be part of the escape sequence. To make it work regardless of what it is followed by it has to have enough leading zeros to be exactly eight digits long: U0001F615

                – kasperd
                Dec 30 '15 at 9:18











              • @kasperd, thanks. Yes, it's worth noting. I've included that in the answer.

                – Stéphane Chazelas
                Dec 30 '15 at 9:49













              • 2





                Notice that if U1F615 is followed by another valid hexadecimal digit then that will be assumed to be part of the escape sequence. To make it work regardless of what it is followed by it has to have enough leading zeros to be exactly eight digits long: U0001F615

                – kasperd
                Dec 30 '15 at 9:18











              • @kasperd, thanks. Yes, it's worth noting. I've included that in the answer.

                – Stéphane Chazelas
                Dec 30 '15 at 9:49








              2




              2





              Notice that if U1F615 is followed by another valid hexadecimal digit then that will be assumed to be part of the escape sequence. To make it work regardless of what it is followed by it has to have enough leading zeros to be exactly eight digits long: U0001F615

              – kasperd
              Dec 30 '15 at 9:18





              Notice that if U1F615 is followed by another valid hexadecimal digit then that will be assumed to be part of the escape sequence. To make it work regardless of what it is followed by it has to have enough leading zeros to be exactly eight digits long: U0001F615

              – kasperd
              Dec 30 '15 at 9:18













              @kasperd, thanks. Yes, it's worth noting. I've included that in the answer.

              – Stéphane Chazelas
              Dec 30 '15 at 9:49






              @kasperd, thanks. Yes, it's worth noting. I've included that in the answer.

              – Stéphane Chazelas
              Dec 30 '15 at 9:49














              7














              Here's a way to convert from UTF-32 (big endian) to UTF-8



              $ confused=$(echo -ne "x0x01xF6x15" | iconv -f UTF-32BE -t UTF-8) 
              $ echo $confused
              😕


              You'll notice your hex value 0x01F615 in there, padded with an extra leading 0 to fill 32 bits.



              The Wikipedia page on UTF-8 explains the transformation from a Unicode codepoint to its UTF-8 representation very clearly. But trying to do it yourself in shell scripting might not be the best idea.



              UTF-32 is fixed-width, and the correspondence between codepoint and UTF-32 representation is trivial - the value is the same.






              share|improve this answer



























                7














                Here's a way to convert from UTF-32 (big endian) to UTF-8



                $ confused=$(echo -ne "x0x01xF6x15" | iconv -f UTF-32BE -t UTF-8) 
                $ echo $confused
                😕


                You'll notice your hex value 0x01F615 in there, padded with an extra leading 0 to fill 32 bits.



                The Wikipedia page on UTF-8 explains the transformation from a Unicode codepoint to its UTF-8 representation very clearly. But trying to do it yourself in shell scripting might not be the best idea.



                UTF-32 is fixed-width, and the correspondence between codepoint and UTF-32 representation is trivial - the value is the same.






                share|improve this answer

























                  7












                  7








                  7







                  Here's a way to convert from UTF-32 (big endian) to UTF-8



                  $ confused=$(echo -ne "x0x01xF6x15" | iconv -f UTF-32BE -t UTF-8) 
                  $ echo $confused
                  😕


                  You'll notice your hex value 0x01F615 in there, padded with an extra leading 0 to fill 32 bits.



                  The Wikipedia page on UTF-8 explains the transformation from a Unicode codepoint to its UTF-8 representation very clearly. But trying to do it yourself in shell scripting might not be the best idea.



                  UTF-32 is fixed-width, and the correspondence between codepoint and UTF-32 representation is trivial - the value is the same.






                  share|improve this answer













                  Here's a way to convert from UTF-32 (big endian) to UTF-8



                  $ confused=$(echo -ne "x0x01xF6x15" | iconv -f UTF-32BE -t UTF-8) 
                  $ echo $confused
                  😕


                  You'll notice your hex value 0x01F615 in there, padded with an extra leading 0 to fill 32 bits.



                  The Wikipedia page on UTF-8 explains the transformation from a Unicode codepoint to its UTF-8 representation very clearly. But trying to do it yourself in shell scripting might not be the best idea.



                  UTF-32 is fixed-width, and the correspondence between codepoint and UTF-32 representation is trivial - the value is the same.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 30 '15 at 7:16









                  MatMat

                  39.8k8123128




                  39.8k8123128





















                      6














                      Nice way to do it in your head or on paper:



                      1. Figure out how many bytes it will be: values under U+0080 are one byte, else under U+0800 are 2 bytes, else under U+10000 are 3 bytes, else 4 bytes. In your case, 4 bytes.


                      2. Convert hex to octal: 0373025.


                      3. Starting at the end, peel off 2 octal digits at a time to get a sequence of octal values: 037 030 025.


                      4. If you have fewer octal values than the expected number of bytes, add an extra 0 at the beginning: 000 037 030 025.


                      5. For all but the first, add on 0200 to get: 000 0237 0230 0225.


                      6. For the first, add 0300 if the expected length is 2, 0340 if it's 3, or 0360 if it's 4, to get: 360 0237 0230 0225.


                      Now write as a string of octal escapes: 360237230225. Optionally convert back to hex if you want.






                      share|improve this answer





























                        6














                        Nice way to do it in your head or on paper:



                        1. Figure out how many bytes it will be: values under U+0080 are one byte, else under U+0800 are 2 bytes, else under U+10000 are 3 bytes, else 4 bytes. In your case, 4 bytes.


                        2. Convert hex to octal: 0373025.


                        3. Starting at the end, peel off 2 octal digits at a time to get a sequence of octal values: 037 030 025.


                        4. If you have fewer octal values than the expected number of bytes, add an extra 0 at the beginning: 000 037 030 025.


                        5. For all but the first, add on 0200 to get: 000 0237 0230 0225.


                        6. For the first, add 0300 if the expected length is 2, 0340 if it's 3, or 0360 if it's 4, to get: 360 0237 0230 0225.


                        Now write as a string of octal escapes: 360237230225. Optionally convert back to hex if you want.






                        share|improve this answer



























                          6












                          6








                          6







                          Nice way to do it in your head or on paper:



                          1. Figure out how many bytes it will be: values under U+0080 are one byte, else under U+0800 are 2 bytes, else under U+10000 are 3 bytes, else 4 bytes. In your case, 4 bytes.


                          2. Convert hex to octal: 0373025.


                          3. Starting at the end, peel off 2 octal digits at a time to get a sequence of octal values: 037 030 025.


                          4. If you have fewer octal values than the expected number of bytes, add an extra 0 at the beginning: 000 037 030 025.


                          5. For all but the first, add on 0200 to get: 000 0237 0230 0225.


                          6. For the first, add 0300 if the expected length is 2, 0340 if it's 3, or 0360 if it's 4, to get: 360 0237 0230 0225.


                          Now write as a string of octal escapes: 360237230225. Optionally convert back to hex if you want.






                          share|improve this answer















                          Nice way to do it in your head or on paper:



                          1. Figure out how many bytes it will be: values under U+0080 are one byte, else under U+0800 are 2 bytes, else under U+10000 are 3 bytes, else 4 bytes. In your case, 4 bytes.


                          2. Convert hex to octal: 0373025.


                          3. Starting at the end, peel off 2 octal digits at a time to get a sequence of octal values: 037 030 025.


                          4. If you have fewer octal values than the expected number of bytes, add an extra 0 at the beginning: 000 037 030 025.


                          5. For all but the first, add on 0200 to get: 000 0237 0230 0225.


                          6. For the first, add 0300 if the expected length is 2, 0340 if it's 3, or 0360 if it's 4, to get: 360 0237 0230 0225.


                          Now write as a string of octal escapes: 360237230225. Optionally convert back to hex if you want.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Dec 30 '15 at 21:33

























                          answered Dec 30 '15 at 15:53









                          R..R..

                          1,91411016




                          1,91411016



























                              draft saved

                              draft discarded
















































                              Thanks for contributing an answer to Unix & Linux Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid


                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.

                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function ()
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f252286%2fhow-to-convert-an-emoticon-specified-by-a-uxxxxx-code-to-utf-8%23new-answer', 'question_page');

                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              getting Checkpoint VPN SSL Network Extender working in the command lineHow to connect to CheckPoint VPN on Ubuntu 18.04LTS?Will the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayVPN SSL Network Extender in FirefoxLinux Checkpoint SNX tool configuration issuesCheck Point - Connect under Linux - snx + OTPSNX VPN Ububuntu 18.XXUsing Checkpoint VPN SSL Network Extender CLI with certificateVPN with network manager (nm-applet) is not workingWill the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayImport VPN config files to NetworkManager from command lineTrouble connecting to VPN using network-manager, while command line worksStart a VPN connection with PPTP protocol on command linestarting a docker service daemon breaks the vpn networkCan't connect to vpn with Network-managerVPN SSL Network Extender in FirefoxUsing Checkpoint VPN SSL Network Extender CLI with certificate

                              NetworkManager fails with “Could not find source connection”Trouble connecting to VPN using network-manager, while command line worksHow can I be notified about state changes to a VPN adapterBacktrack 5 R3 - Refuses to connect to VPNFeed all traffic through OpenVPN for a specific network namespace onlyRun daemon on startup in Debian once openvpn connection establishedpfsense tcp connection between openvpn and lan is brokenInternet connection problem with web browsers onlyWhy does NetworkManager explicitly support tun/tap devices?Browser issues with VPNTwo IP addresses assigned to the same network card - OpenVPN issues?Cannot connect to WiFi with nmcli, although secrets are provided

                              대한민국 목차 국명 지리 역사 정치 국방 경제 사회 문화 국제 순위 관련 항목 각주 외부 링크 둘러보기 메뉴북위 37° 34′ 08″ 동경 126° 58′ 36″ / 북위 37.568889° 동경 126.976667°  / 37.568889; 126.976667ehThe Korean Repository문단을 편집문단을 편집추가해Clarkson PLC 사Report for Selected Countries and Subjects-Korea“Human Development Index and its components: P.198”“http://www.law.go.kr/%EB%B2%95%EB%A0%B9/%EB%8C%80%ED%95%9C%EB%AF%BC%EA%B5%AD%EA%B5%AD%EA%B8%B0%EB%B2%95”"한국은 국제법상 한반도 유일 합법정부 아니다" - 오마이뉴스 모바일Report for Selected Countries and Subjects: South Korea격동의 역사와 함께한 조선일보 90년 : 조선일보 인수해 혁신시킨 신석우, 임시정부 때는 '대한민국' 국호(國號) 정해《우리가 몰랐던 우리 역사: 나라 이름의 비밀을 찾아가는 역사 여행》“남북 공식호칭 ‘남한’‘북한’으로 쓴다”“Corea 대 Korea, 누가 이긴 거야?”국내기후자료 - 한국[김대중 前 대통령 서거] 과감한 구조개혁 'DJ노믹스'로 최단기간 환란극복 :: 네이버 뉴스“이라크 "韓-쿠르드 유전개발 MOU 승인 안해"(종합)”“해외 우리국민 추방사례 43%가 일본”차기전차 K2'흑표'의 세계 최고 전력 분석, 쿠키뉴스 엄기영, 2007-03-02두산인프라, 헬기잡는 장갑차 'K21'...내년부터 공급, 고뉴스 이대준, 2008-10-30과거 내용 찾기mk 뉴스 - 구매력 기준으로 보면 한국 1인당 소득 3만弗과거 내용 찾기"The N-11: More Than an Acronym"Archived조선일보 최우석, 2008-11-01Global 500 2008: Countries - South Korea“몇년째 '시한폭탄'... 가계부채, 올해는 터질까”가구당 부채 5000만원 처음 넘어서“‘빚’으로 내몰리는 사회.. 위기의 가계대출”“[경제365] 공공부문 부채 급증…800조 육박”“"소득 양극화 다소 완화...불평등은 여전"”“공정사회·공생발전 한참 멀었네”iSuppli,08年2QのDRAMシェア・ランキングを発表(08/8/11)South Korea dominates shipbuilding industry | Stock Market News & Stocks to Watch from StraightStocks한국 자동차 생산, 3년 연속 세계 5위자동차수출 '현대-삼성 웃고 기아-대우-쌍용은 울고' 과거 내용 찾기동반성장위 창립 1주년 맞아Archived"중기적합 3개업종 합의 무시한 채 선정"李대통령, 사업 무분별 확장 소상공인 생계 위협 질타삼성-LG, 서민업종인 빵·분식사업 잇따라 철수상생은 뒷전…SSM ‘몸집 불리기’ 혈안Archived“경부고속도에 '아시안하이웨이' 표지판”'철의 실크로드' 앞서 '말(言)의 실크로드'부터, 프레시안 정창현, 2008-10-01“'서울 지하철은 안전한가?'”“서울시 “올해 안에 모든 지하철역 스크린도어 설치””“부산지하철 1,2호선 승강장 안전펜스 설치 완료”“전교조, 정부 노조 통계서 처음 빠져”“[Weekly BIZ] 도요타 '제로 이사회'가 리콜 사태 불러들였다”“S Korea slams high tuition costs”““정치가 여론 양극화 부채질… 합리주의 절실””“〈"`촛불집회'는 민주주의의 질적 변화 상징"〉”““촛불집회가 민주주의 왜곡 초래””“국민 65%, "한국 노사관계 대립적"”“한국 국가경쟁력 27위‥노사관계 '꼴찌'”“제대로 형성되지 않은 대한민국 이념지형”“[신년기획-갈등의 시대] 갈등지수 OECD 4위…사회적 손실 GDP 27% 무려 300조”“2012 총선-대선의 키워드는 '국민과 소통'”“한국 삶의 질 27위, 2000년과 2008년 연속 하위권 머물러”“[해피 코리아] 행복점수 68점…해외 평가선 '낙제점'”“한국 어린이·청소년 행복지수 3년 연속 OECD ‘꼴찌’”“한국 이혼율 OECD중 8위”“[통계청] 한국 이혼율 OECD 4위”“오피니언 [이렇게 생각한다] `부부의 날` 에 돌아본 이혼율 1위 한국”“Suicide Rates by Country, Global Health Observatory Data Repository.”“1. 또 다른 차별”“오피니언 [편집자에게] '왕따'와 '패거리 정치' 심리는 닮은꼴”“[미래한국리포트] 무한경쟁에 빠진 대한민국”“대학생 98% "외모가 경쟁력이라는 말 동의"”“특급호텔 웨딩·200만원대 유모차… "남보다 더…" 호화病, 고질병 됐다”“[스트레스 공화국] ① 경쟁사회, 스트레스 쌓인다”““매일 30여명 자살 한국, 의사보다 무속인에…””“"자살 부르는 '우울증', 환자 중 85% 치료 안 받아"”“정신병원을 가다”“대한민국도 ‘묻지마 범죄’,안전지대 아니다”“유엔 "학생 '성적 지향'에 따른 차별 금지하라"”“유엔아동권리위원회 보고서 및 번역본 원문”“고졸 성공스토리 담은 '제빵왕 김탁구' 드라마 나온다”“‘빛 좋은 개살구’ 고졸 취업…실습 대신 착취”원본 문서“정신건강, 사회적 편견부터 고쳐드립니다”‘소통’과 ‘행복’에 목 마른 사회가 잠들어 있던 ‘심리학’ 깨웠다“[포토] 사유리-곽금주 교수의 유쾌한 심리상담”“"올해 한국인 평균 영화관람횟수 세계 1위"(종합)”“[게임연중기획] 게임은 문화다-여가활동 1순위 게임”“영화속 ‘영어 지상주의’ …“왠지 씁쓸한데””“2월 `신문 부수 인증기관` 지정..방송법 후속작업”“무료신문 성장동력 ‘차별성’과 ‘갈등해소’”대한민국 국회 법률지식정보시스템"Pew Research Center's Religion & Public Life Project: South Korea"“amp;vwcd=MT_ZTITLE&path=인구·가구%20>%20인구총조사%20>%20인구부문%20>%20 총조사인구(2005)%20>%20전수부문&oper_YN=Y&item=&keyword=종교별%20인구& amp;lang_mode=kor&list_id= 2005년 통계청 인구 총조사”원본 문서“한국인이 좋아하는 취미와 운동 (2004-2009)”“한국인이 좋아하는 취미와 운동 (2004-2014)”Archived“한국, `부분적 언론자유국' 강등〈프리덤하우스〉”“국경없는기자회 "한국, 인터넷감시 대상국"”“한국, 조선산업 1위 유지(S. Korea Stays Top Shipbuilding Nation) RZD-Partner Portal”원본 문서“한국, 4년 만에 ‘선박건조 1위’”“옛 마산시,인터넷속도 세계 1위”“"한국 초고속 인터넷망 세계1위"”“인터넷·휴대폰 요금, 외국보다 훨씬 비싸”“한국 관세행정 6년 연속 세계 '1위'”“한국 교통사고 사망자 수 OECD 회원국 중 2위”“결핵 후진국' 한국, 환자가 급증한 이유는”“수술은 신중해야… 자칫하면 생명 위협”대한민국분류대한민국의 지도대한민국 정부대표 다국어포털대한민국 전자정부대한민국 국회한국방송공사about korea and information korea브리태니커 백과사전(한국편)론리플래닛의 정보(한국편)CIA의 세계 정보(한국편)마리암 부디아 (Mariam Budia),『한국: 하늘이 내린 한 폭의 그림』, 서울: 트랜스라틴 19호 (2012년 3월)대한민국ehehehehehehehehehehehehehehWorldCat132441370n791268020000 0001 2308 81034078029-6026373548cb11863345f(데이터)00573706ge128495