Matching two files and printing lines that appear first time

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
3
down vote

favorite












I have two files that look like this:



file1 (unique IDs):



 C84610112
C96209347
C84774620
C84774691
C85594749
C89372772
C89651687
C89845500
C89914896
C91269765
C91526663
C92210411
C92254517
C93709504
C94303303
C95100561
C95100609
C95417520
C95696352
C96045246
C96045496
C96060727
C96076986


and file2:



 1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
28 C70006381 score: -49.092 nathvy = 24 nconfs = 880


I would like to match IDs from file1 with IDs in file2 (second column) and for those that are matching to print them. Also, in file2 some IDs are repeating, such as C96209347 (although whole lines are not identical). I would like to grep those lines that are appearing for the first time only and others to skip. So in this specific example with C96209347 only third line from file2 should be printed. Anybody can help?










share|improve this question



























    up vote
    3
    down vote

    favorite












    I have two files that look like this:



    file1 (unique IDs):



     C84610112
    C96209347
    C84774620
    C84774691
    C85594749
    C89372772
    C89651687
    C89845500
    C89914896
    C91269765
    C91526663
    C92210411
    C92254517
    C93709504
    C94303303
    C95100561
    C95100609
    C95417520
    C95696352
    C96045246
    C96045496
    C96060727
    C96076986


    and file2:



     1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
    2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
    3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
    4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
    5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
    6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
    7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
    8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
    9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
    10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
    11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
    12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
    13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
    14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
    15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
    16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
    17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
    18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
    19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
    20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
    21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
    22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
    23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
    24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
    25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
    26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
    27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
    28 C70006381 score: -49.092 nathvy = 24 nconfs = 880


    I would like to match IDs from file1 with IDs in file2 (second column) and for those that are matching to print them. Also, in file2 some IDs are repeating, such as C96209347 (although whole lines are not identical). I would like to grep those lines that are appearing for the first time only and others to skip. So in this specific example with C96209347 only third line from file2 should be printed. Anybody can help?










    share|improve this question

























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      I have two files that look like this:



      file1 (unique IDs):



       C84610112
      C96209347
      C84774620
      C84774691
      C85594749
      C89372772
      C89651687
      C89845500
      C89914896
      C91269765
      C91526663
      C92210411
      C92254517
      C93709504
      C94303303
      C95100561
      C95100609
      C95417520
      C95696352
      C96045246
      C96045496
      C96060727
      C96076986


      and file2:



       1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
      2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
      3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
      4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
      5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
      6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
      7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
      8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
      9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
      10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
      11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
      12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
      13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
      14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
      15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
      16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
      17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
      18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
      19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
      20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
      21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
      22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
      23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
      24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
      25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
      26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
      27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
      28 C70006381 score: -49.092 nathvy = 24 nconfs = 880


      I would like to match IDs from file1 with IDs in file2 (second column) and for those that are matching to print them. Also, in file2 some IDs are repeating, such as C96209347 (although whole lines are not identical). I would like to grep those lines that are appearing for the first time only and others to skip. So in this specific example with C96209347 only third line from file2 should be printed. Anybody can help?










      share|improve this question















      I have two files that look like this:



      file1 (unique IDs):



       C84610112
      C96209347
      C84774620
      C84774691
      C85594749
      C89372772
      C89651687
      C89845500
      C89914896
      C91269765
      C91526663
      C92210411
      C92254517
      C93709504
      C94303303
      C95100561
      C95100609
      C95417520
      C95696352
      C96045246
      C96045496
      C96060727
      C96076986


      and file2:



       1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
      2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
      3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
      4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
      5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
      6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
      7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
      8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
      9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
      10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
      11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
      12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
      13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
      14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
      15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
      16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
      17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
      18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
      19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
      20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
      21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
      22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
      23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
      24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
      25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
      26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
      27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
      28 C70006381 score: -49.092 nathvy = 24 nconfs = 880


      I would like to match IDs from file1 with IDs in file2 (second column) and for those that are matching to print them. Also, in file2 some IDs are repeating, such as C96209347 (although whole lines are not identical). I would like to grep those lines that are appearing for the first time only and others to skip. So in this specific example with C96209347 only third line from file2 should be printed. Anybody can help?







      command-line text-processing






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Aug 31 at 7:45









      pa4080

      12.2k52255




      12.2k52255










      asked Aug 31 at 7:25









      sergio

      736




      736




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          8
          down vote



          accepted










          Try this,



          grep -f file1 file2 | awk '!_[$2]++'

          1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
          3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
          6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
          20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536


          Explanation




          • grep -f file1 file2: search in file2 for matches of patterns obtained from file1


          • awk '!_[$2]++': Don't print anything if field $2 has been seen before (via)


            • _ is the array name (can be anything, e.g. "seen")


            • _[$2]++ will create an array entry with the key being the content of field $2 and add 1

            • If _[$2] was not (!) already set, print the line. The printcommand is the default action that is made by awk when the condition matches.






          share|improve this answer


















          • 1




            This works. Thank you very much! All the best
            – sergio
            Aug 31 at 7:45

















          up vote
          1
          down vote













          With awk alone:



          $ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
          1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
          3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
          6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
          20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536





          share|improve this answer




















            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "89"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1070748%2fmatching-two-files-and-printing-lines-that-appear-first-time%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            8
            down vote



            accepted










            Try this,



            grep -f file1 file2 | awk '!_[$2]++'

            1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
            3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
            6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
            20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536


            Explanation




            • grep -f file1 file2: search in file2 for matches of patterns obtained from file1


            • awk '!_[$2]++': Don't print anything if field $2 has been seen before (via)


              • _ is the array name (can be anything, e.g. "seen")


              • _[$2]++ will create an array entry with the key being the content of field $2 and add 1

              • If _[$2] was not (!) already set, print the line. The printcommand is the default action that is made by awk when the condition matches.






            share|improve this answer


















            • 1




              This works. Thank you very much! All the best
              – sergio
              Aug 31 at 7:45














            up vote
            8
            down vote



            accepted










            Try this,



            grep -f file1 file2 | awk '!_[$2]++'

            1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
            3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
            6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
            20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536


            Explanation




            • grep -f file1 file2: search in file2 for matches of patterns obtained from file1


            • awk '!_[$2]++': Don't print anything if field $2 has been seen before (via)


              • _ is the array name (can be anything, e.g. "seen")


              • _[$2]++ will create an array entry with the key being the content of field $2 and add 1

              • If _[$2] was not (!) already set, print the line. The printcommand is the default action that is made by awk when the condition matches.






            share|improve this answer


















            • 1




              This works. Thank you very much! All the best
              – sergio
              Aug 31 at 7:45












            up vote
            8
            down vote



            accepted







            up vote
            8
            down vote



            accepted






            Try this,



            grep -f file1 file2 | awk '!_[$2]++'

            1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
            3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
            6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
            20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536


            Explanation




            • grep -f file1 file2: search in file2 for matches of patterns obtained from file1


            • awk '!_[$2]++': Don't print anything if field $2 has been seen before (via)


              • _ is the array name (can be anything, e.g. "seen")


              • _[$2]++ will create an array entry with the key being the content of field $2 and add 1

              • If _[$2] was not (!) already set, print the line. The printcommand is the default action that is made by awk when the condition matches.






            share|improve this answer














            Try this,



            grep -f file1 file2 | awk '!_[$2]++'

            1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
            3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
            6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
            20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536


            Explanation




            • grep -f file1 file2: search in file2 for matches of patterns obtained from file1


            • awk '!_[$2]++': Don't print anything if field $2 has been seen before (via)


              • _ is the array name (can be anything, e.g. "seen")


              • _[$2]++ will create an array entry with the key being the content of field $2 and add 1

              • If _[$2] was not (!) already set, print the line. The printcommand is the default action that is made by awk when the condition matches.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Aug 31 at 7:52

























            answered Aug 31 at 7:40









            RoVo

            5,5411236




            5,5411236







            • 1




              This works. Thank you very much! All the best
              – sergio
              Aug 31 at 7:45












            • 1




              This works. Thank you very much! All the best
              – sergio
              Aug 31 at 7:45







            1




            1




            This works. Thank you very much! All the best
            – sergio
            Aug 31 at 7:45




            This works. Thank you very much! All the best
            – sergio
            Aug 31 at 7:45












            up vote
            1
            down vote













            With awk alone:



            $ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
            1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
            3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
            6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
            20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536





            share|improve this answer
























              up vote
              1
              down vote













              With awk alone:



              $ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
              1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
              3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
              6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
              20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536





              share|improve this answer






















                up vote
                1
                down vote










                up vote
                1
                down vote









                With awk alone:



                $ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
                1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
                3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
                6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
                20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536





                share|improve this answer












                With awk alone:



                $ awk 'NR==FNR a[$1]=1; next $2 in a print; delete a[$2]' file1 file2
                1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
                3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
                6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
                20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Aug 31 at 10:57









                steeldriver

                62.9k1198166




                62.9k1198166



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1070748%2fmatching-two-files-and-printing-lines-that-appear-first-time%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    這個網誌中的熱門文章

                    How to combine Bézier curves to a surface?

                    Mutual Information Always Non-negative

                    Why am i infinitely getting the same tweet with the Twitter Search API?