awk: extract string from a field [closed]

Multi tool use
Multi tool use

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
2
down vote

favorite












in the input fields are separated by pipe sign:



CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba


I want to get output where last field is changed (extracted only what is between first = and : in this field



expected output is:



CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8









share|improve this question















closed as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart Sep 11 at 1:28


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
    – Sparhawk
    Sep 10 at 12:23











  • output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
    – Chris
    Sep 10 at 12:28














up vote
2
down vote

favorite












in the input fields are separated by pipe sign:



CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba


I want to get output where last field is changed (extracted only what is between first = and : in this field



expected output is:



CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8









share|improve this question















closed as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart Sep 11 at 1:28


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
    – Sparhawk
    Sep 10 at 12:23











  • output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
    – Chris
    Sep 10 at 12:28












up vote
2
down vote

favorite









up vote
2
down vote

favorite











in the input fields are separated by pipe sign:



CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba


I want to get output where last field is changed (extracted only what is between first = and : in this field



expected output is:



CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8









share|improve this question















in the input fields are separated by pipe sign:



CCCC|Sess C1|s1 DA=yy07:@##;/u/t/we
DDDDD|Sess C2|s4 DB=yy8:@##;/u/ba


I want to get output where last field is changed (extracted only what is between first = and : in this field



expected output is:



CCCC|Sess C1|yy07
DDDDD|Sess C2|yy8






shell-script awk gawk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 10 at 12:18

























asked Sep 10 at 12:04









Chris

947




947




closed as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart Sep 11 at 1:28


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






closed as unclear what you're asking by Sparhawk, Rui F Ribeiro, andcoz, RalfFriedl, DarkHeart Sep 11 at 1:28


Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.













  • What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
    – Sparhawk
    Sep 10 at 12:23











  • output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
    – Chris
    Sep 10 at 12:28
















  • What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
    – Sparhawk
    Sep 10 at 12:23











  • output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
    – Chris
    Sep 10 at 12:28















What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
– Sparhawk
Sep 10 at 12:23





What do you mean by "I want to get output where last field is changed"? What exactly defines the expected output? Is it the part before the second |, plus the part between = and :? Please edit your question to add this information.
– Sparhawk
Sep 10 at 12:23













output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
– Chris
Sep 10 at 12:28




output columns are separated also with | (pipe) - only in the last column I need to print only what is between first = and first : in original last column
– Chris
Sep 10 at 12:28










3 Answers
3






active

oldest

votes

















up vote
6
down vote













standard awk is not very good at extracting data out of fields based on patterns. Some options include:




  • split() to split the text into an array based on specified delimiters.


  • match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

So here:



awk -F'|' -v OFS='|' '
split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt


So returns the portion between the first and second occurrence of a = or : in $3.



Or:



awk -F'|' -v OFS='|' '
match($3, /=[^:]*/)
print $1, $2, substr($3, RSTART+1, RLENGTH-1)
' < file.txt


GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:



gawk -F'|' -v OFS='|' '
$3 ~ /=/
print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
' < file.txt


Looks for = followed by any number of non-:s and extracts the part after =. The problem with gensub() is that you can't easily tell if the substitution was successful or not, hence the check that $3 contains = first.



With sed:



sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt


With perl:



perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt





share|improve this answer






















  • Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
    – rexkogitans
    Sep 10 at 14:04










  • @rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
    – Stéphane Chazelas
    Sep 10 at 14:28










  • The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
    – rexkogitans
    Sep 10 at 15:06










  • @rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
    – Stéphane Chazelas
    Sep 10 at 15:33

















up vote
4
down vote













I would try



awk -F| 'BEGIN OFS=" 
col=index($3,":");
equ=index($3,"=");
$3=substr($3,equ+1,col-equ-1);
print ; ' se


where




  • -F| tell awk to use | as input separator


  • equ=index($3,"="); get index of = in third field


  • $3=substr($3,equ+1,col-equ-1); do actual substitution





share|improve this answer



























    up vote
    0
    down vote













    The first sub removes the first sixth characters in field 3 and second sub
    removes everything after colon including.



    awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file





    share|improve this answer



























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      6
      down vote













      standard awk is not very good at extracting data out of fields based on patterns. Some options include:




      • split() to split the text into an array based on specified delimiters.


      • match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

      So here:



      awk -F'|' -v OFS='|' '
      split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt


      So returns the portion between the first and second occurrence of a = or : in $3.



      Or:



      awk -F'|' -v OFS='|' '
      match($3, /=[^:]*/)
      print $1, $2, substr($3, RSTART+1, RLENGTH-1)
      ' < file.txt


      GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:



      gawk -F'|' -v OFS='|' '
      $3 ~ /=/
      print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
      ' < file.txt


      Looks for = followed by any number of non-:s and extracts the part after =. The problem with gensub() is that you can't easily tell if the substitution was successful or not, hence the check that $3 contains = first.



      With sed:



      sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt


      With perl:



      perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt





      share|improve this answer






















      • Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
        – rexkogitans
        Sep 10 at 14:04










      • @rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
        – Stéphane Chazelas
        Sep 10 at 14:28










      • The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
        – rexkogitans
        Sep 10 at 15:06










      • @rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
        – Stéphane Chazelas
        Sep 10 at 15:33














      up vote
      6
      down vote













      standard awk is not very good at extracting data out of fields based on patterns. Some options include:




      • split() to split the text into an array based on specified delimiters.


      • match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

      So here:



      awk -F'|' -v OFS='|' '
      split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt


      So returns the portion between the first and second occurrence of a = or : in $3.



      Or:



      awk -F'|' -v OFS='|' '
      match($3, /=[^:]*/)
      print $1, $2, substr($3, RSTART+1, RLENGTH-1)
      ' < file.txt


      GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:



      gawk -F'|' -v OFS='|' '
      $3 ~ /=/
      print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
      ' < file.txt


      Looks for = followed by any number of non-:s and extracts the part after =. The problem with gensub() is that you can't easily tell if the substitution was successful or not, hence the check that $3 contains = first.



      With sed:



      sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt


      With perl:



      perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt





      share|improve this answer






















      • Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
        – rexkogitans
        Sep 10 at 14:04










      • @rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
        – Stéphane Chazelas
        Sep 10 at 14:28










      • The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
        – rexkogitans
        Sep 10 at 15:06










      • @rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
        – Stéphane Chazelas
        Sep 10 at 15:33












      up vote
      6
      down vote










      up vote
      6
      down vote









      standard awk is not very good at extracting data out of fields based on patterns. Some options include:




      • split() to split the text into an array based on specified delimiters.


      • match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

      So here:



      awk -F'|' -v OFS='|' '
      split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt


      So returns the portion between the first and second occurrence of a = or : in $3.



      Or:



      awk -F'|' -v OFS='|' '
      match($3, /=[^:]*/)
      print $1, $2, substr($3, RSTART+1, RLENGTH-1)
      ' < file.txt


      GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:



      gawk -F'|' -v OFS='|' '
      $3 ~ /=/
      print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
      ' < file.txt


      Looks for = followed by any number of non-:s and extracts the part after =. The problem with gensub() is that you can't easily tell if the substitution was successful or not, hence the check that $3 contains = first.



      With sed:



      sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt


      With perl:



      perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt





      share|improve this answer














      standard awk is not very good at extracting data out of fields based on patterns. Some options include:




      • split() to split the text into an array based on specified delimiters.


      • match() which sets the RSTART and RLENGTH variables to indicate where the match occurred, and then use subtr() to extract the matched portion.

      So here:



      awk -F'|' -v OFS='|' '
      split($3, a, /[=:]/) >= 2 print $1, $2, a[2]' < file.txt


      So returns the portion between the first and second occurrence of a = or : in $3.



      Or:



      awk -F'|' -v OFS='|' '
      match($3, /=[^:]*/)
      print $1, $2, substr($3, RSTART+1, RLENGTH-1)
      ' < file.txt


      GNU awk has a gensub() extension which brings the functionality of sed's s command into awk:



      gawk -F'|' -v OFS='|' '
      $3 ~ /=/
      print $1, $2, gensub(/^[^=]*=([^:]*).*/, "\1", 1, $3)
      ' < file.txt


      Looks for = followed by any number of non-:s and extracts the part after =. The problem with gensub() is that you can't easily tell if the substitution was successful or not, hence the check that $3 contains = first.



      With sed:



      sed -n 's/^([^|]*|[^|]*|)[^=|]*=([^:|]*).*/12/p' < file.txt


      With perl:



      perl -F'[|]' -lane 'print "$F[0]|$F[1]|$1" if $F[2] =~ /=([^:]*)/' < file.txt






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Sep 10 at 14:27

























      answered Sep 10 at 12:40









      Stéphane Chazelas

      285k53525864




      285k53525864











      • Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
        – rexkogitans
        Sep 10 at 14:04










      • @rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
        – Stéphane Chazelas
        Sep 10 at 14:28










      • The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
        – rexkogitans
        Sep 10 at 15:06










      • @rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
        – Stéphane Chazelas
        Sep 10 at 15:33
















      • Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
        – rexkogitans
        Sep 10 at 14:04










      • @rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
        – Stéphane Chazelas
        Sep 10 at 14:28










      • The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
        – rexkogitans
        Sep 10 at 15:06










      • @rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
        – Stéphane Chazelas
        Sep 10 at 15:33















      Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
      – rexkogitans
      Sep 10 at 14:04




      Damn, you were faster. I tried with gawk: awk -F '|' -v OFS='|' 'print $1,$2,gensub(/^[^=]*=([^:]*).*$/, "\1", "1", $3)' < file.txt which is pretty much the same as your suggestion.
      – rexkogitans
      Sep 10 at 14:04












      @rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
      – Stéphane Chazelas
      Sep 10 at 14:28




      @rexkogitans, thanks. made me realise that my using of $3 = gensub(... as the condition was wrong.
      – Stéphane Chazelas
      Sep 10 at 14:28












      The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
      – rexkogitans
      Sep 10 at 15:06




      The OP does not mention a condition for the 3rd column at all. I assume they are all formatted like this, so I suggest to drop the condition for the main block at all.
      – rexkogitans
      Sep 10 at 15:06












      @rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
      – Stéphane Chazelas
      Sep 10 at 15:33




      @rexkogitans, I've all made them to print only those 3 fields if the 3rd field of input was in the expected format. I'll leave it as is unless the OP clarifies what to do when the input is not in the expected format.
      – Stéphane Chazelas
      Sep 10 at 15:33












      up vote
      4
      down vote













      I would try



      awk -F| 'BEGIN OFS=" 
      col=index($3,":");
      equ=index($3,"=");
      $3=substr($3,equ+1,col-equ-1);
      print ; ' se


      where




      • -F| tell awk to use | as input separator


      • equ=index($3,"="); get index of = in third field


      • $3=substr($3,equ+1,col-equ-1); do actual substitution





      share|improve this answer
























        up vote
        4
        down vote













        I would try



        awk -F| 'BEGIN OFS=" 
        col=index($3,":");
        equ=index($3,"=");
        $3=substr($3,equ+1,col-equ-1);
        print ; ' se


        where




        • -F| tell awk to use | as input separator


        • equ=index($3,"="); get index of = in third field


        • $3=substr($3,equ+1,col-equ-1); do actual substitution





        share|improve this answer






















          up vote
          4
          down vote










          up vote
          4
          down vote









          I would try



          awk -F| 'BEGIN OFS=" 
          col=index($3,":");
          equ=index($3,"=");
          $3=substr($3,equ+1,col-equ-1);
          print ; ' se


          where




          • -F| tell awk to use | as input separator


          • equ=index($3,"="); get index of = in third field


          • $3=substr($3,equ+1,col-equ-1); do actual substitution





          share|improve this answer












          I would try



          awk -F| 'BEGIN OFS=" 
          col=index($3,":");
          equ=index($3,"=");
          $3=substr($3,equ+1,col-equ-1);
          print ; ' se


          where




          • -F| tell awk to use | as input separator


          • equ=index($3,"="); get index of = in third field


          • $3=substr($3,equ+1,col-equ-1); do actual substitution






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Sep 10 at 12:39









          Archemar

          19.1k93467




          19.1k93467




















              up vote
              0
              down vote













              The first sub removes the first sixth characters in field 3 and second sub
              removes everything after colon including.



              awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file





              share|improve this answer
























                up vote
                0
                down vote













                The first sub removes the first sixth characters in field 3 and second sub
                removes everything after colon including.



                awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file





                share|improve this answer






















                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  The first sub removes the first sixth characters in field 3 and second sub
                  removes everything after colon including.



                  awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file





                  share|improve this answer












                  The first sub removes the first sixth characters in field 3 and second sub
                  removes everything after colon including.



                  awk -F| 'sub(/.6/,"",$3)sub(/:.*/,"")1' OFS=| file






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Sep 10 at 21:15









                  Claes Wikner

                  11713




                  11713












                      K17m,MCdwDrltZoypYqhzDZUHD205oCujmSAVw 2oeEIphHH,0A BaHfx,8BjCN,ho2as7tAjmZR7fIZXTknSIZb W,5 9ry XK HIShMZ0MRX
                      ONCZhE3kS946U0ATgrF5i,ro6JuXWuBjJLRGmGhyc

                      這個網誌中的熱門文章

                      How to combine Bézier curves to a surface?

                      Propositional logic and tautologies

                      Distribution of Stopped Wiener Process with Stochastic Volatility