Calculate Levenshtein distance between two strings in Python The 2019 Stack Overflow Developer Survey Results Are InEdit Distance Between Two StringsString Matching and ClusteringSorting movie search results by similarityEdit distance between 2 stringsMaking the Levenshtein distance code cleanerEdit distance (Optimal Alignment) - follow upGet Levenshtein DistanceMessage classification with Levenshtein DistanceCode to implement the Jaro similarity for fuzzy matching stringsFinding differences in strings with Levenshtein distance and soundex

Is bread bad for ducks?

Old scifi movie from the 50s or 60s with men in solid red uniforms who interrogate a spy from the past

If I score a critical hit on an 18 or higher, what are my chances of getting a critical hit if I roll 3d20?

Why can't devices on different VLANs, but on the same subnet, communicate?

A female thief is not sold to make restitution -- so what happens instead?

What is the motivation for a law requiring 2 parties to consent for recording a conversation

Did the UK government pay "millions and millions of dollars" to try to snag Julian Assange?

Keeping a retro style to sci-fi spaceships?

How can I add encounters in the Lost Mine of Phandelver campaign without giving PCs too much XP?

How come people say “Would of”?

Cooking pasta in a water boiler

Match Roman Numerals

Worn-tile Scrabble

Alternative to の

Why does the nucleus not repel itself?

Why are there uneven bright areas in this photo of black hole?

What is this sharp, curved notch on my knife for?

Is there a way to generate a uniformly distributed point on a sphere from a fixed amount of random real numbers?

Are spiders unable to hurt humans, especially very small spiders?

Can there be female White Walkers?

What is the meaning of Triage in Cybersec world?

Can a flute soloist sit?

RequirePermission not working

Why doesn't UInt have a toDouble()?



Calculate Levenshtein distance between two strings in Python



The 2019 Stack Overflow Developer Survey Results Are InEdit Distance Between Two StringsString Matching and ClusteringSorting movie search results by similarityEdit distance between 2 stringsMaking the Levenshtein distance code cleanerEdit distance (Optimal Alignment) - follow upGet Levenshtein DistanceMessage classification with Levenshtein DistanceCode to implement the Jaro similarity for fuzzy matching stringsFinding differences in strings with Levenshtein distance and soundex



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








11












$begingroup$


I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:



from difflib import ndiff

def calculate_levenshtein_distance(str_1, str_2):
"""
The Levenshtein distance is a string metric for measuring the difference between two sequences.
It is calculated as the minimum number of single-character edits necessary to transform one string into another
"""
distance = 0
buffer_removed = buffer_added = 0
for x in ndiff(str_1, str_2):
code = x[0]
# Code ? is ignored as it does not translate to any modification
if code == ' ':
distance += max(buffer_removed, buffer_added)
buffer_removed = buffer_added = 0
elif code == '-':
buffer_removed += 1
elif code == '+':
buffer_added += 1
distance += max(buffer_removed, buffer_added)
return distance


Then calling it as:



similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2))


How sloppy/prone to errors is this code? How can it be improved?










share|improve this question









New contributor




Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$


















    11












    $begingroup$


    I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:



    from difflib import ndiff

    def calculate_levenshtein_distance(str_1, str_2):
    """
    The Levenshtein distance is a string metric for measuring the difference between two sequences.
    It is calculated as the minimum number of single-character edits necessary to transform one string into another
    """
    distance = 0
    buffer_removed = buffer_added = 0
    for x in ndiff(str_1, str_2):
    code = x[0]
    # Code ? is ignored as it does not translate to any modification
    if code == ' ':
    distance += max(buffer_removed, buffer_added)
    buffer_removed = buffer_added = 0
    elif code == '-':
    buffer_removed += 1
    elif code == '+':
    buffer_added += 1
    distance += max(buffer_removed, buffer_added)
    return distance


    Then calling it as:



    similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2))


    How sloppy/prone to errors is this code? How can it be improved?










    share|improve this question









    New contributor




    Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$














      11












      11








      11


      2



      $begingroup$


      I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:



      from difflib import ndiff

      def calculate_levenshtein_distance(str_1, str_2):
      """
      The Levenshtein distance is a string metric for measuring the difference between two sequences.
      It is calculated as the minimum number of single-character edits necessary to transform one string into another
      """
      distance = 0
      buffer_removed = buffer_added = 0
      for x in ndiff(str_1, str_2):
      code = x[0]
      # Code ? is ignored as it does not translate to any modification
      if code == ' ':
      distance += max(buffer_removed, buffer_added)
      buffer_removed = buffer_added = 0
      elif code == '-':
      buffer_removed += 1
      elif code == '+':
      buffer_added += 1
      distance += max(buffer_removed, buffer_added)
      return distance


      Then calling it as:



      similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2))


      How sloppy/prone to errors is this code? How can it be improved?










      share|improve this question









      New contributor




      Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I need a function that checks how different are two different strings. I chose the Levenshtein distance as a quick approach, and implemented this function:



      from difflib import ndiff

      def calculate_levenshtein_distance(str_1, str_2):
      """
      The Levenshtein distance is a string metric for measuring the difference between two sequences.
      It is calculated as the minimum number of single-character edits necessary to transform one string into another
      """
      distance = 0
      buffer_removed = buffer_added = 0
      for x in ndiff(str_1, str_2):
      code = x[0]
      # Code ? is ignored as it does not translate to any modification
      if code == ' ':
      distance += max(buffer_removed, buffer_added)
      buffer_removed = buffer_added = 0
      elif code == '-':
      buffer_removed += 1
      elif code == '+':
      buffer_added += 1
      distance += max(buffer_removed, buffer_added)
      return distance


      Then calling it as:



      similarity = 1 - calculate_levenshtein_distance(str_1, str_2) / max(len(str_1), len(str_2))


      How sloppy/prone to errors is this code? How can it be improved?







      python edit-distance






      share|improve this question









      New contributor




      Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited Apr 8 at 19:14









      Reinderien

      5,445927




      5,445927






      New contributor




      Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked Apr 8 at 10:01









      Kyra_WKyra_W

      585




      585




      New contributor




      Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Kyra_W is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.




















          2 Answers
          2






          active

          oldest

          votes


















          14












          $begingroup$

          There is a module available for exactly that calculation, python-Levenshtein. You can install it with pip install python-Levenshtein.



          It is implemented in C, so is probably faster than anything you can come up with yourself.



          from Levenshtein import distance as levenshtein_distance



          According to the docstring conventions, your docstring should look like this, i.e. with the indentation aligned to the """ and the line length curtailed to 80 characters.



          def calculate_levenshtein_distance(str_1, str_2):
          """
          The Levenshtein distance is a string metric for measuring the difference
          between two sequences.
          It is calculated as the minimum number of single-character edits necessary to
          transform one string into another.
          """
          ...





          share|improve this answer











          $endgroup$








          • 10




            $begingroup$
            Just to note the module is licensed under GPL 2.0 so watch out if you're using it for work.
            $endgroup$
            – lucasgcb
            Apr 8 at 13:08










          • $begingroup$
            Just to point out a small nitpick to other people who may stumble upon this answer, as per help center: "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." While this answer does provide alternative and existing module suggestion, it also goes into some suggestions about improving code quality. So it's an example of a decent answer
            $endgroup$
            – Sergiy Kolodyazhnyy
            Apr 9 at 0:02










          • $begingroup$
            Thanks! I did not know of this module. Will check it out
            $endgroup$
            – Kyra_W
            2 days ago






          • 1




            $begingroup$
            @SergiyKolodyazhnyy While I (obviously) agree, and that is one of the reasons I added that part, I would actually argue that "It is implemented in C, so is probably faster than anything you can come up with yourself" would get around the "no explanation or justification" clause
            $endgroup$
            – Graipher
            2 days ago



















          9












          $begingroup$

          The code itself is rather clear. There are some smaller changes I would make



          tuple unpacking



          You can use tuple unpacking to do:



          for code, *_ in ndiff(str1, str2):


          instead of:



          for x in ndiff(str_1, str_2):
          code = x[0]


          dict results:



          Instead of a counter for the additions and removals, I would keep it in 1 dict: counter = ("+": 0, "-": 0)



          def levenshtein_distance(str1, str2, ):
          counter = "+": 0, "-": 0
          distance = 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          distance += max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          distance += max(counter.values())
          return distance


          generators



          A smaller, less useful variation, is to let this method be a generator, and use the builtin sum to do the summary. this saves 1 variable inside the function:



          def levenshtein_distance_gen(str1, str2, ):
          counter = "+": 0, "-": 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          yield max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          yield max(counter.values())

          sum(levenshtein_distance_gen(str1, str2))



          timings



          The differences in timings between the original and both these variations are minimal, and within the variation of results. This is rather logical, since for simple strings (aaabbbc and abcabcabc) 90% of the time is spent in ndiff






          share|improve this answer









          $endgroup$












          • $begingroup$
            Awesome suggestions. I had not even considered the generator approach, but it looks very nice. Thanks
            $endgroup$
            – Kyra_W
            2 days ago











          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );






          Kyra_W is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217065%2fcalculate-levenshtein-distance-between-two-strings-in-python%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          14












          $begingroup$

          There is a module available for exactly that calculation, python-Levenshtein. You can install it with pip install python-Levenshtein.



          It is implemented in C, so is probably faster than anything you can come up with yourself.



          from Levenshtein import distance as levenshtein_distance



          According to the docstring conventions, your docstring should look like this, i.e. with the indentation aligned to the """ and the line length curtailed to 80 characters.



          def calculate_levenshtein_distance(str_1, str_2):
          """
          The Levenshtein distance is a string metric for measuring the difference
          between two sequences.
          It is calculated as the minimum number of single-character edits necessary to
          transform one string into another.
          """
          ...





          share|improve this answer











          $endgroup$








          • 10




            $begingroup$
            Just to note the module is licensed under GPL 2.0 so watch out if you're using it for work.
            $endgroup$
            – lucasgcb
            Apr 8 at 13:08










          • $begingroup$
            Just to point out a small nitpick to other people who may stumble upon this answer, as per help center: "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." While this answer does provide alternative and existing module suggestion, it also goes into some suggestions about improving code quality. So it's an example of a decent answer
            $endgroup$
            – Sergiy Kolodyazhnyy
            Apr 9 at 0:02










          • $begingroup$
            Thanks! I did not know of this module. Will check it out
            $endgroup$
            – Kyra_W
            2 days ago






          • 1




            $begingroup$
            @SergiyKolodyazhnyy While I (obviously) agree, and that is one of the reasons I added that part, I would actually argue that "It is implemented in C, so is probably faster than anything you can come up with yourself" would get around the "no explanation or justification" clause
            $endgroup$
            – Graipher
            2 days ago
















          14












          $begingroup$

          There is a module available for exactly that calculation, python-Levenshtein. You can install it with pip install python-Levenshtein.



          It is implemented in C, so is probably faster than anything you can come up with yourself.



          from Levenshtein import distance as levenshtein_distance



          According to the docstring conventions, your docstring should look like this, i.e. with the indentation aligned to the """ and the line length curtailed to 80 characters.



          def calculate_levenshtein_distance(str_1, str_2):
          """
          The Levenshtein distance is a string metric for measuring the difference
          between two sequences.
          It is calculated as the minimum number of single-character edits necessary to
          transform one string into another.
          """
          ...





          share|improve this answer











          $endgroup$








          • 10




            $begingroup$
            Just to note the module is licensed under GPL 2.0 so watch out if you're using it for work.
            $endgroup$
            – lucasgcb
            Apr 8 at 13:08










          • $begingroup$
            Just to point out a small nitpick to other people who may stumble upon this answer, as per help center: "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." While this answer does provide alternative and existing module suggestion, it also goes into some suggestions about improving code quality. So it's an example of a decent answer
            $endgroup$
            – Sergiy Kolodyazhnyy
            Apr 9 at 0:02










          • $begingroup$
            Thanks! I did not know of this module. Will check it out
            $endgroup$
            – Kyra_W
            2 days ago






          • 1




            $begingroup$
            @SergiyKolodyazhnyy While I (obviously) agree, and that is one of the reasons I added that part, I would actually argue that "It is implemented in C, so is probably faster than anything you can come up with yourself" would get around the "no explanation or justification" clause
            $endgroup$
            – Graipher
            2 days ago














          14












          14








          14





          $begingroup$

          There is a module available for exactly that calculation, python-Levenshtein. You can install it with pip install python-Levenshtein.



          It is implemented in C, so is probably faster than anything you can come up with yourself.



          from Levenshtein import distance as levenshtein_distance



          According to the docstring conventions, your docstring should look like this, i.e. with the indentation aligned to the """ and the line length curtailed to 80 characters.



          def calculate_levenshtein_distance(str_1, str_2):
          """
          The Levenshtein distance is a string metric for measuring the difference
          between two sequences.
          It is calculated as the minimum number of single-character edits necessary to
          transform one string into another.
          """
          ...





          share|improve this answer











          $endgroup$



          There is a module available for exactly that calculation, python-Levenshtein. You can install it with pip install python-Levenshtein.



          It is implemented in C, so is probably faster than anything you can come up with yourself.



          from Levenshtein import distance as levenshtein_distance



          According to the docstring conventions, your docstring should look like this, i.e. with the indentation aligned to the """ and the line length curtailed to 80 characters.



          def calculate_levenshtein_distance(str_1, str_2):
          """
          The Levenshtein distance is a string metric for measuring the difference
          between two sequences.
          It is calculated as the minimum number of single-character edits necessary to
          transform one string into another.
          """
          ...






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 8 at 10:43

























          answered Apr 8 at 10:37









          GraipherGraipher

          27.1k54497




          27.1k54497







          • 10




            $begingroup$
            Just to note the module is licensed under GPL 2.0 so watch out if you're using it for work.
            $endgroup$
            – lucasgcb
            Apr 8 at 13:08










          • $begingroup$
            Just to point out a small nitpick to other people who may stumble upon this answer, as per help center: "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." While this answer does provide alternative and existing module suggestion, it also goes into some suggestions about improving code quality. So it's an example of a decent answer
            $endgroup$
            – Sergiy Kolodyazhnyy
            Apr 9 at 0:02










          • $begingroup$
            Thanks! I did not know of this module. Will check it out
            $endgroup$
            – Kyra_W
            2 days ago






          • 1




            $begingroup$
            @SergiyKolodyazhnyy While I (obviously) agree, and that is one of the reasons I added that part, I would actually argue that "It is implemented in C, so is probably faster than anything you can come up with yourself" would get around the "no explanation or justification" clause
            $endgroup$
            – Graipher
            2 days ago













          • 10




            $begingroup$
            Just to note the module is licensed under GPL 2.0 so watch out if you're using it for work.
            $endgroup$
            – lucasgcb
            Apr 8 at 13:08










          • $begingroup$
            Just to point out a small nitpick to other people who may stumble upon this answer, as per help center: "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." While this answer does provide alternative and existing module suggestion, it also goes into some suggestions about improving code quality. So it's an example of a decent answer
            $endgroup$
            – Sergiy Kolodyazhnyy
            Apr 9 at 0:02










          • $begingroup$
            Thanks! I did not know of this module. Will check it out
            $endgroup$
            – Kyra_W
            2 days ago






          • 1




            $begingroup$
            @SergiyKolodyazhnyy While I (obviously) agree, and that is one of the reasons I added that part, I would actually argue that "It is implemented in C, so is probably faster than anything you can come up with yourself" would get around the "no explanation or justification" clause
            $endgroup$
            – Graipher
            2 days ago








          10




          10




          $begingroup$
          Just to note the module is licensed under GPL 2.0 so watch out if you're using it for work.
          $endgroup$
          – lucasgcb
          Apr 8 at 13:08




          $begingroup$
          Just to note the module is licensed under GPL 2.0 so watch out if you're using it for work.
          $endgroup$
          – lucasgcb
          Apr 8 at 13:08












          $begingroup$
          Just to point out a small nitpick to other people who may stumble upon this answer, as per help center: "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." While this answer does provide alternative and existing module suggestion, it also goes into some suggestions about improving code quality. So it's an example of a decent answer
          $endgroup$
          – Sergiy Kolodyazhnyy
          Apr 9 at 0:02




          $begingroup$
          Just to point out a small nitpick to other people who may stumble upon this answer, as per help center: "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted." While this answer does provide alternative and existing module suggestion, it also goes into some suggestions about improving code quality. So it's an example of a decent answer
          $endgroup$
          – Sergiy Kolodyazhnyy
          Apr 9 at 0:02












          $begingroup$
          Thanks! I did not know of this module. Will check it out
          $endgroup$
          – Kyra_W
          2 days ago




          $begingroup$
          Thanks! I did not know of this module. Will check it out
          $endgroup$
          – Kyra_W
          2 days ago




          1




          1




          $begingroup$
          @SergiyKolodyazhnyy While I (obviously) agree, and that is one of the reasons I added that part, I would actually argue that "It is implemented in C, so is probably faster than anything you can come up with yourself" would get around the "no explanation or justification" clause
          $endgroup$
          – Graipher
          2 days ago





          $begingroup$
          @SergiyKolodyazhnyy While I (obviously) agree, and that is one of the reasons I added that part, I would actually argue that "It is implemented in C, so is probably faster than anything you can come up with yourself" would get around the "no explanation or justification" clause
          $endgroup$
          – Graipher
          2 days ago














          9












          $begingroup$

          The code itself is rather clear. There are some smaller changes I would make



          tuple unpacking



          You can use tuple unpacking to do:



          for code, *_ in ndiff(str1, str2):


          instead of:



          for x in ndiff(str_1, str_2):
          code = x[0]


          dict results:



          Instead of a counter for the additions and removals, I would keep it in 1 dict: counter = ("+": 0, "-": 0)



          def levenshtein_distance(str1, str2, ):
          counter = "+": 0, "-": 0
          distance = 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          distance += max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          distance += max(counter.values())
          return distance


          generators



          A smaller, less useful variation, is to let this method be a generator, and use the builtin sum to do the summary. this saves 1 variable inside the function:



          def levenshtein_distance_gen(str1, str2, ):
          counter = "+": 0, "-": 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          yield max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          yield max(counter.values())

          sum(levenshtein_distance_gen(str1, str2))



          timings



          The differences in timings between the original and both these variations are minimal, and within the variation of results. This is rather logical, since for simple strings (aaabbbc and abcabcabc) 90% of the time is spent in ndiff






          share|improve this answer









          $endgroup$












          • $begingroup$
            Awesome suggestions. I had not even considered the generator approach, but it looks very nice. Thanks
            $endgroup$
            – Kyra_W
            2 days ago















          9












          $begingroup$

          The code itself is rather clear. There are some smaller changes I would make



          tuple unpacking



          You can use tuple unpacking to do:



          for code, *_ in ndiff(str1, str2):


          instead of:



          for x in ndiff(str_1, str_2):
          code = x[0]


          dict results:



          Instead of a counter for the additions and removals, I would keep it in 1 dict: counter = ("+": 0, "-": 0)



          def levenshtein_distance(str1, str2, ):
          counter = "+": 0, "-": 0
          distance = 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          distance += max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          distance += max(counter.values())
          return distance


          generators



          A smaller, less useful variation, is to let this method be a generator, and use the builtin sum to do the summary. this saves 1 variable inside the function:



          def levenshtein_distance_gen(str1, str2, ):
          counter = "+": 0, "-": 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          yield max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          yield max(counter.values())

          sum(levenshtein_distance_gen(str1, str2))



          timings



          The differences in timings between the original and both these variations are minimal, and within the variation of results. This is rather logical, since for simple strings (aaabbbc and abcabcabc) 90% of the time is spent in ndiff






          share|improve this answer









          $endgroup$












          • $begingroup$
            Awesome suggestions. I had not even considered the generator approach, but it looks very nice. Thanks
            $endgroup$
            – Kyra_W
            2 days ago













          9












          9








          9





          $begingroup$

          The code itself is rather clear. There are some smaller changes I would make



          tuple unpacking



          You can use tuple unpacking to do:



          for code, *_ in ndiff(str1, str2):


          instead of:



          for x in ndiff(str_1, str_2):
          code = x[0]


          dict results:



          Instead of a counter for the additions and removals, I would keep it in 1 dict: counter = ("+": 0, "-": 0)



          def levenshtein_distance(str1, str2, ):
          counter = "+": 0, "-": 0
          distance = 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          distance += max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          distance += max(counter.values())
          return distance


          generators



          A smaller, less useful variation, is to let this method be a generator, and use the builtin sum to do the summary. this saves 1 variable inside the function:



          def levenshtein_distance_gen(str1, str2, ):
          counter = "+": 0, "-": 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          yield max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          yield max(counter.values())

          sum(levenshtein_distance_gen(str1, str2))



          timings



          The differences in timings between the original and both these variations are minimal, and within the variation of results. This is rather logical, since for simple strings (aaabbbc and abcabcabc) 90% of the time is spent in ndiff






          share|improve this answer









          $endgroup$



          The code itself is rather clear. There are some smaller changes I would make



          tuple unpacking



          You can use tuple unpacking to do:



          for code, *_ in ndiff(str1, str2):


          instead of:



          for x in ndiff(str_1, str_2):
          code = x[0]


          dict results:



          Instead of a counter for the additions and removals, I would keep it in 1 dict: counter = ("+": 0, "-": 0)



          def levenshtein_distance(str1, str2, ):
          counter = "+": 0, "-": 0
          distance = 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          distance += max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          distance += max(counter.values())
          return distance


          generators



          A smaller, less useful variation, is to let this method be a generator, and use the builtin sum to do the summary. this saves 1 variable inside the function:



          def levenshtein_distance_gen(str1, str2, ):
          counter = "+": 0, "-": 0
          for edit_code, *_ in ndiff(str1, str2):
          if edit_code == " ":
          yield max(counter.values())
          counter = "+": 0, "-": 0
          else:
          counter[edit_code] += 1
          yield max(counter.values())

          sum(levenshtein_distance_gen(str1, str2))



          timings



          The differences in timings between the original and both these variations are minimal, and within the variation of results. This is rather logical, since for simple strings (aaabbbc and abcabcabc) 90% of the time is spent in ndiff







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Apr 8 at 13:51









          Maarten FabréMaarten Fabré

          5,179517




          5,179517











          • $begingroup$
            Awesome suggestions. I had not even considered the generator approach, but it looks very nice. Thanks
            $endgroup$
            – Kyra_W
            2 days ago
















          • $begingroup$
            Awesome suggestions. I had not even considered the generator approach, but it looks very nice. Thanks
            $endgroup$
            – Kyra_W
            2 days ago















          $begingroup$
          Awesome suggestions. I had not even considered the generator approach, but it looks very nice. Thanks
          $endgroup$
          – Kyra_W
          2 days ago




          $begingroup$
          Awesome suggestions. I had not even considered the generator approach, but it looks very nice. Thanks
          $endgroup$
          – Kyra_W
          2 days ago










          Kyra_W is a new contributor. Be nice, and check out our Code of Conduct.









          draft saved

          draft discarded


















          Kyra_W is a new contributor. Be nice, and check out our Code of Conduct.












          Kyra_W is a new contributor. Be nice, and check out our Code of Conduct.











          Kyra_W is a new contributor. Be nice, and check out our Code of Conduct.














          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f217065%2fcalculate-levenshtein-distance-between-two-strings-in-python%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          getting Checkpoint VPN SSL Network Extender working in the command lineHow to connect to CheckPoint VPN on Ubuntu 18.04LTS?Will the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayVPN SSL Network Extender in FirefoxLinux Checkpoint SNX tool configuration issuesCheck Point - Connect under Linux - snx + OTPSNX VPN Ububuntu 18.XXUsing Checkpoint VPN SSL Network Extender CLI with certificateVPN with network manager (nm-applet) is not workingWill the Linux ( red-hat ) Open VPNC Client connect to checkpoint or nortel VPN gateways?VPN client for linux machine + support checkpoint gatewayImport VPN config files to NetworkManager from command lineTrouble connecting to VPN using network-manager, while command line worksStart a VPN connection with PPTP protocol on command linestarting a docker service daemon breaks the vpn networkCan't connect to vpn with Network-managerVPN SSL Network Extender in FirefoxUsing Checkpoint VPN SSL Network Extender CLI with certificate

          Cannot Extend partition with GParted The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election ResultsCan't increase partition size with GParted?GParted doesn't recognize the unallocated space after my current partitionWhat is the best way to add unallocated space located before to Ubuntu 12.04 partition with GParted live?I can't figure out how to extend my Arch home partition into free spaceGparted Linux Mint 18.1 issueTrying to extend but swap partition is showing as Unknown in Gparted, shows proper from fdiskRearrange partitions in gparted to extend a partitionUnable to extend partition even though unallocated space is next to it using GPartedAllocate free space to root partitiongparted: how to merge unallocated space with a partition

          Marilyn Monroe Ny fiainany manokana | Jereo koa | Meny fitetezanafanitarana azy.