Comparison of these two algorithms?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
16
down vote

favorite












So I'm presented with a problem that states. "Determine if a string contains all unique characters"



So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.



private static boolean allUniqueCharacters(String s) 

Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);

else
return false;



return true;




According to the book I am reading this is the "optimal solution"



public static boolean isUniqueChars2(String str) 
if (str.length() > 128)
return false;

boolean char_set = new boolean[128];

for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);

if (char_set[val])
return false;

char_set[val] = true;


return true;



My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?



Thank you.







share|improve this question


















  • 15




    Well, the book "solution" fails in Java because characters are 16 bits and can have a value from 0 to 65535, so a 128-byte array won't work.
    – Jim Garrison
    Aug 21 at 7:43






  • 21




    They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
    – Amadan
    Aug 21 at 7:43







  • 5




    Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
    – Jilles van Gurp
    Aug 21 at 8:54







  • 7




    I suggest throwing the book away if the author thinks there are only 128 characters
    – phuclv
    Aug 21 at 9:33






  • 2




    To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
    – Nathan Adams
    Aug 21 at 9:43














up vote
16
down vote

favorite












So I'm presented with a problem that states. "Determine if a string contains all unique characters"



So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.



private static boolean allUniqueCharacters(String s) 

Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);

else
return false;



return true;




According to the book I am reading this is the "optimal solution"



public static boolean isUniqueChars2(String str) 
if (str.length() > 128)
return false;

boolean char_set = new boolean[128];

for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);

if (char_set[val])
return false;

char_set[val] = true;


return true;



My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?



Thank you.







share|improve this question


















  • 15




    Well, the book "solution" fails in Java because characters are 16 bits and can have a value from 0 to 65535, so a 128-byte array won't work.
    – Jim Garrison
    Aug 21 at 7:43






  • 21




    They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
    – Amadan
    Aug 21 at 7:43







  • 5




    Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
    – Jilles van Gurp
    Aug 21 at 8:54







  • 7




    I suggest throwing the book away if the author thinks there are only 128 characters
    – phuclv
    Aug 21 at 9:33






  • 2




    To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
    – Nathan Adams
    Aug 21 at 9:43












up vote
16
down vote

favorite









up vote
16
down vote

favorite











So I'm presented with a problem that states. "Determine if a string contains all unique characters"



So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.



private static boolean allUniqueCharacters(String s) 

Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);

else
return false;



return true;




According to the book I am reading this is the "optimal solution"



public static boolean isUniqueChars2(String str) 
if (str.length() > 128)
return false;

boolean char_set = new boolean[128];

for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);

if (char_set[val])
return false;

char_set[val] = true;


return true;



My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?



Thank you.







share|improve this question














So I'm presented with a problem that states. "Determine if a string contains all unique characters"



So I wrote up this solution that adds each character to a set, but if the character already exists it returns false.



private static boolean allUniqueCharacters(String s) 

Set<Character> charSet = new HashSet<Character>();
for (int i = 0; i < s.length(); i++)
char currentChar = s.charAt(i);
if (!charSet.contains(currentChar))
charSet.add(currentChar);

else
return false;



return true;




According to the book I am reading this is the "optimal solution"



public static boolean isUniqueChars2(String str) 
if (str.length() > 128)
return false;

boolean char_set = new boolean[128];

for (int i = 0; i < str.length(); i++)
int val = str.charAt(i);

if (char_set[val])
return false;

char_set[val] = true;


return true;



My question is, is my implementation slower than the one presented? I assume it is, but if a Hash look up is O(1) wouldn't they be the same complexity?



Thank you.









share|improve this question













share|improve this question




share|improve this question








edited Aug 21 at 7:54









OldCurmudgeon

50.2k1179158




50.2k1179158










asked Aug 21 at 7:39









fsdff

858




858







  • 15




    Well, the book "solution" fails in Java because characters are 16 bits and can have a value from 0 to 65535, so a 128-byte array won't work.
    – Jim Garrison
    Aug 21 at 7:43






  • 21




    They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
    – Amadan
    Aug 21 at 7:43







  • 5




    Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
    – Jilles van Gurp
    Aug 21 at 8:54







  • 7




    I suggest throwing the book away if the author thinks there are only 128 characters
    – phuclv
    Aug 21 at 9:33






  • 2




    To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
    – Nathan Adams
    Aug 21 at 9:43












  • 15




    Well, the book "solution" fails in Java because characters are 16 bits and can have a value from 0 to 65535, so a 128-byte array won't work.
    – Jim Garrison
    Aug 21 at 7:43






  • 21




    They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
    – Amadan
    Aug 21 at 7:43







  • 5




    Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
    – Jilles van Gurp
    Aug 21 at 8:54







  • 7




    I suggest throwing the book away if the author thinks there are only 128 characters
    – phuclv
    Aug 21 at 9:33






  • 2




    To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
    – Nathan Adams
    Aug 21 at 9:43







15




15




Well, the book "solution" fails in Java because characters are 16 bits and can have a value from 0 to 65535, so a 128-byte array won't work.
– Jim Garrison
Aug 21 at 7:43




Well, the book "solution" fails in Java because characters are 16 bits and can have a value from 0 to 65535, so a 128-byte array won't work.
– Jim Garrison
Aug 21 at 7:43




21




21




They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
– Amadan
Aug 21 at 7:43





They are of same complexity. Not necessarily of same speed. Carrying 10 vases up a staircase has the same complexity as carrying 10 pianos up a staircase, but the effort involved is somewhat different. Hash O(1) is slower than array O(1), given that hash needs to, at least, calculate the hash. Also, hash is amortised O(1), not true O(1); it does not matter complexity-wise, but it does affect speed.
– Amadan
Aug 21 at 7:43





5




5




Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
– Jilles van Gurp
Aug 21 at 8:54





Apples and oranges. Your implementation is more general since it doesn't assume this is pure ascii. One performance improvement you can do is getting rid of the contains operation and simply comparing the size of the set and the string. As soon as they start to differ, return false.
– Jilles van Gurp
Aug 21 at 8:54





7




7




I suggest throwing the book away if the author thinks there are only 128 characters
– phuclv
Aug 21 at 9:33




I suggest throwing the book away if the author thinks there are only 128 characters
– phuclv
Aug 21 at 9:33




2




2




To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
– Nathan Adams
Aug 21 at 9:43




To be fair your algorithm calculates the hash twice for the contains() and the add(). The add() method returns whether the element was already contained so no need for the extra contains() check.
– Nathan Adams
Aug 21 at 9:43












6 Answers
6






active

oldest

votes

















up vote
12
down vote



accepted










As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.



Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.



For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char values there are. You would also need to remove the check for a string longer than 128.



This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.






share|improve this answer



























    up vote
    3
    down vote













    Both algorithms have time complexity of O(N). The difference is in their space complexity.



    The book's solution will always require storage for 128 characters - O(1), while your solution's space requirement will vary linearly according to the input - O(N).



    The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.






    share|improve this answer






















    • in an interview would my solution be unacceptable or trivial?
      – fsdff
      Aug 21 at 7:54










    • @fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
      – ernest_k
      Aug 21 at 7:57










    • I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
      – Yves Daoust
      Aug 21 at 8:23










    • @YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
      – ernest_k
      Aug 21 at 8:30










    • @ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
      – Yves Daoust
      Aug 21 at 8:33


















    up vote
    2
    down vote













    The hashmap is in theory acceptable, but is a waste.



    A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.



    This adds a lot of overhead in terms of space and time, compared to a straight array.



    Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.




    As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.






    share|improve this answer






















    • While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs Θ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And I’m curious how you intend to improve the asymptotic space complexity, which you’ve omitted to explain in your answer.
      – Konrad Rudolph
      Aug 21 at 13:58











    • @KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
      – Yves Daoust
      Aug 21 at 14:03










    • But this isn’t the hash function being used here, and this isn’t how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how you’d implement this algorithm for a Unicode alphabet.
      – Konrad Rudolph
      Aug 21 at 14:08











    • @KonradRudolph: the question is about ASCII.
      – Yves Daoust
      Aug 21 at 14:17










    • It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
      – Konrad Rudolph
      Aug 21 at 14:21


















    up vote
    1
    down vote













    Your algorithm is also O(1). You can think about complexity like how my algorithm will react to the change in amount of elements processed. Therefore O(n) and O(2n) are effectively equal.



    People are talking about O notation as growth rate here






    share|improve this answer




















    • Not O(1). Definitely. Unless there is an upper bound on input size.
      – user202729
      Aug 21 at 9:27

















    up vote
    1
    down vote













    Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.






    share|improve this answer



























      up vote
      0
      down vote













      The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k), while the array has a lookup complexity in O(1).



      This sounds like your algorithm must be much worse. But in fact it is not, as k is bounded by 128 (else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1) as well with a bit bigger constants than the array lookup.



      * assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n) resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.



      This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n) for the converstion, O(log n) for the lookup) before its lookup time degenerates to O(n) because of too many collisions.






      share|improve this answer






















      • I moved this discussion into chat.
        – allo
        Aug 21 at 14:50











      Your Answer





      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51943836%2fcomparison-of-these-two-algorithms%23new-answer', 'question_page');

      );

      Post as a guest






























      6 Answers
      6






      active

      oldest

      votes








      6 Answers
      6






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      12
      down vote



      accepted










      As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.



      Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.



      For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char values there are. You would also need to remove the check for a string longer than 128.



      This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.






      share|improve this answer
























        up vote
        12
        down vote



        accepted










        As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.



        Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.



        For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char values there are. You would also need to remove the check for a string longer than 128.



        This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.






        share|improve this answer






















          up vote
          12
          down vote



          accepted







          up vote
          12
          down vote



          accepted






          As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.



          Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.



          For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char values there are. You would also need to remove the check for a string longer than 128.



          This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.






          share|improve this answer












          As Amadan said in the comments, the two solutions have the same time complexity O(n) because you have a for loop looping through the string, and you do constant time operations in the for loop. This means that the time it takes to run your methods increases linearly with the length of the string.



          Note that time complexity is all about how the time it takes changes when you change the size of the input. It's not about how fast it is with data of the same size.



          For the same string, the "optimal" solution should be faster because sets have some overheads over arrays. Handling arrays is faster than handling sets. However, to actually make the "optimal" solution work, you would need an array of length 2^16. That is how many different char values there are. You would also need to remove the check for a string longer than 128.



          This is one of the many examples of the tradeoff between space and time. If you want it to go faster, you need more space. If you want to save space, you have to go slower.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Aug 21 at 7:58









          Sweeper

          54.2k960116




          54.2k960116






















              up vote
              3
              down vote













              Both algorithms have time complexity of O(N). The difference is in their space complexity.



              The book's solution will always require storage for 128 characters - O(1), while your solution's space requirement will vary linearly according to the input - O(N).



              The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.






              share|improve this answer






















              • in an interview would my solution be unacceptable or trivial?
                – fsdff
                Aug 21 at 7:54










              • @fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
                – ernest_k
                Aug 21 at 7:57










              • I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
                – Yves Daoust
                Aug 21 at 8:23










              • @YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
                – ernest_k
                Aug 21 at 8:30










              • @ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
                – Yves Daoust
                Aug 21 at 8:33















              up vote
              3
              down vote













              Both algorithms have time complexity of O(N). The difference is in their space complexity.



              The book's solution will always require storage for 128 characters - O(1), while your solution's space requirement will vary linearly according to the input - O(N).



              The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.






              share|improve this answer






















              • in an interview would my solution be unacceptable or trivial?
                – fsdff
                Aug 21 at 7:54










              • @fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
                – ernest_k
                Aug 21 at 7:57










              • I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
                – Yves Daoust
                Aug 21 at 8:23










              • @YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
                – ernest_k
                Aug 21 at 8:30










              • @ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
                – Yves Daoust
                Aug 21 at 8:33













              up vote
              3
              down vote










              up vote
              3
              down vote









              Both algorithms have time complexity of O(N). The difference is in their space complexity.



              The book's solution will always require storage for 128 characters - O(1), while your solution's space requirement will vary linearly according to the input - O(N).



              The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.






              share|improve this answer














              Both algorithms have time complexity of O(N). The difference is in their space complexity.



              The book's solution will always require storage for 128 characters - O(1), while your solution's space requirement will vary linearly according to the input - O(N).



              The book's space requirement is based on an assumed character set with 128 characters. But this may be rather problematic (and not scalable) given the likelihood of needing different character sets.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Aug 21 at 8:02

























              answered Aug 21 at 7:52









              ernest_k

              13.3k31426




              13.3k31426











              • in an interview would my solution be unacceptable or trivial?
                – fsdff
                Aug 21 at 7:54










              • @fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
                – ernest_k
                Aug 21 at 7:57










              • I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
                – Yves Daoust
                Aug 21 at 8:23










              • @YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
                – ernest_k
                Aug 21 at 8:30










              • @ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
                – Yves Daoust
                Aug 21 at 8:33

















              • in an interview would my solution be unacceptable or trivial?
                – fsdff
                Aug 21 at 7:54










              • @fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
                – ernest_k
                Aug 21 at 7:57










              • I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
                – Yves Daoust
                Aug 21 at 8:23










              • @YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
                – ernest_k
                Aug 21 at 8:30










              • @ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
                – Yves Daoust
                Aug 21 at 8:33
















              in an interview would my solution be unacceptable or trivial?
              – fsdff
              Aug 21 at 7:54




              in an interview would my solution be unacceptable or trivial?
              – fsdff
              Aug 21 at 7:54












              @fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
              – ernest_k
              Aug 21 at 7:57




              @fsdff that will depend entirely on what the interviewer expects, so it's very subjective. But your solution seems very normal/common to me.
              – ernest_k
              Aug 21 at 7:57












              I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
              – Yves Daoust
              Aug 21 at 8:23




              I don't see how the OP's solution would be O(N). When all possible keys have been entered in the hashmap, it will not grow anymore. The required space is O(1) as for the array.
              – Yves Daoust
              Aug 21 at 8:23












              @YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
              – ernest_k
              Aug 21 at 8:30




              @YvesDaoust You're assuming some character set, that makes sense... except that the point of complexity analysis is getting an idea of worst-case scenarios. And whether the character set is finite or not, it's a fact in this case that space complexity depends on input, and no explicit mention was made about the size of the character set.
              – ernest_k
              Aug 21 at 8:30












              @ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
              – Yves Daoust
              Aug 21 at 8:33





              @ernest_k: whatever the alphabet size, let M, there is no difference between the space complexities: both O(M) (or O(M Log M) if you want to be picky), contrary to what you say. And no dependency on the size of the input.
              – Yves Daoust
              Aug 21 at 8:33











              up vote
              2
              down vote













              The hashmap is in theory acceptable, but is a waste.



              A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.



              This adds a lot of overhead in terms of space and time, compared to a straight array.



              Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.




              As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.






              share|improve this answer






















              • While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs Θ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And I’m curious how you intend to improve the asymptotic space complexity, which you’ve omitted to explain in your answer.
                – Konrad Rudolph
                Aug 21 at 13:58











              • @KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
                – Yves Daoust
                Aug 21 at 14:03










              • But this isn’t the hash function being used here, and this isn’t how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how you’d implement this algorithm for a Unicode alphabet.
                – Konrad Rudolph
                Aug 21 at 14:08











              • @KonradRudolph: the question is about ASCII.
                – Yves Daoust
                Aug 21 at 14:17










              • It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
                – Konrad Rudolph
                Aug 21 at 14:21















              up vote
              2
              down vote













              The hashmap is in theory acceptable, but is a waste.



              A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.



              This adds a lot of overhead in terms of space and time, compared to a straight array.



              Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.




              As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.






              share|improve this answer






















              • While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs Θ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And I’m curious how you intend to improve the asymptotic space complexity, which you’ve omitted to explain in your answer.
                – Konrad Rudolph
                Aug 21 at 13:58











              • @KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
                – Yves Daoust
                Aug 21 at 14:03










              • But this isn’t the hash function being used here, and this isn’t how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how you’d implement this algorithm for a Unicode alphabet.
                – Konrad Rudolph
                Aug 21 at 14:08











              • @KonradRudolph: the question is about ASCII.
                – Yves Daoust
                Aug 21 at 14:17










              • It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
                – Konrad Rudolph
                Aug 21 at 14:21













              up vote
              2
              down vote










              up vote
              2
              down vote









              The hashmap is in theory acceptable, but is a waste.



              A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.



              This adds a lot of overhead in terms of space and time, compared to a straight array.



              Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.




              As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.






              share|improve this answer














              The hashmap is in theory acceptable, but is a waste.



              A hashmap is built over an array (so it is certainly more costly than an array), and collision resolution requires extra space (at least the double of the number of elements). In addition, any access requires the computation of the hash and possibly the resolution of collisions.



              This adds a lot of overhead in terms of space and time, compared to a straight array.



              Also note that it is kind of folklore that a hash table has an O(1) behavior. The worst case is much poorer, accesses can take up to O(N) time for a table of size N.




              As a final remark, the time complexity of this algorithm is O(1) because you conclude false at worse when N>128.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Aug 21 at 13:34

























              answered Aug 21 at 8:21









              Yves Daoust

              34.6k62454




              34.6k62454











              • While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs Θ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And I’m curious how you intend to improve the asymptotic space complexity, which you’ve omitted to explain in your answer.
                – Konrad Rudolph
                Aug 21 at 13:58











              • @KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
                – Yves Daoust
                Aug 21 at 14:03










              • But this isn’t the hash function being used here, and this isn’t how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how you’d implement this algorithm for a Unicode alphabet.
                – Konrad Rudolph
                Aug 21 at 14:08











              • @KonradRudolph: the question is about ASCII.
                – Yves Daoust
                Aug 21 at 14:17










              • It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
                – Konrad Rudolph
                Aug 21 at 14:21

















              • While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs Θ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And I’m curious how you intend to improve the asymptotic space complexity, which you’ve omitted to explain in your answer.
                – Konrad Rudolph
                Aug 21 at 13:58











              • @KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
                – Yves Daoust
                Aug 21 at 14:03










              • But this isn’t the hash function being used here, and this isn’t how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how you’d implement this algorithm for a Unicode alphabet.
                – Konrad Rudolph
                Aug 21 at 14:08











              • @KonradRudolph: the question is about ASCII.
                – Yves Daoust
                Aug 21 at 14:17










              • It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
                – Konrad Rudolph
                Aug 21 at 14:21
















              While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs Θ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And I’m curious how you intend to improve the asymptotic space complexity, which you’ve omitted to explain in your answer.
              – Konrad Rudolph
              Aug 21 at 13:58





              While the worst case for individual accesses to a hash table is worse than O(1), this is entirely irrelevant here because the algorithm performs Θ(n) accesses, not a single one. Unless a piss-poor hash algorithm or collision avoidance scheme is used, the expected runtime of the algorithm is very much still O(n). And I’m curious how you intend to improve the asymptotic space complexity, which you’ve omitted to explain in your answer.
              – Konrad Rudolph
              Aug 21 at 13:58













              @KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
              – Yves Daoust
              Aug 21 at 14:03




              @KonradRudolph: for instance a hash function returning a constant. In this question, there is nothing that can be improved asymptotically. I am just saying that hashing brings no practical benefit, on the opposite.
              – Yves Daoust
              Aug 21 at 14:03












              But this isn’t the hash function being used here, and this isn’t how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how you’d implement this algorithm for a Unicode alphabet.
              – Konrad Rudolph
              Aug 21 at 14:08





              But this isn’t the hash function being used here, and this isn’t how the complexity of the hash table operations are usually (or usefully) calculated. But OK, you say hashing brings no practical benefits; in that case please enlighten me how you’d implement this algorithm for a Unicode alphabet.
              – Konrad Rudolph
              Aug 21 at 14:08













              @KonradRudolph: the question is about ASCII.
              – Yves Daoust
              Aug 21 at 14:17




              @KonradRudolph: the question is about ASCII.
              – Yves Daoust
              Aug 21 at 14:17












              It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
              – Konrad Rudolph
              Aug 21 at 14:21





              It was my impression that the comments under the question established that the question was not about ASCII, and that, rather, the answer given in the book is simply wrong. At any rate, ASCII contains 127 valid characters, not 128 so at best the answer in the book is still incompetent.
              – Konrad Rudolph
              Aug 21 at 14:21











              up vote
              1
              down vote













              Your algorithm is also O(1). You can think about complexity like how my algorithm will react to the change in amount of elements processed. Therefore O(n) and O(2n) are effectively equal.



              People are talking about O notation as growth rate here






              share|improve this answer




















              • Not O(1). Definitely. Unless there is an upper bound on input size.
                – user202729
                Aug 21 at 9:27














              up vote
              1
              down vote













              Your algorithm is also O(1). You can think about complexity like how my algorithm will react to the change in amount of elements processed. Therefore O(n) and O(2n) are effectively equal.



              People are talking about O notation as growth rate here






              share|improve this answer




















              • Not O(1). Definitely. Unless there is an upper bound on input size.
                – user202729
                Aug 21 at 9:27












              up vote
              1
              down vote










              up vote
              1
              down vote









              Your algorithm is also O(1). You can think about complexity like how my algorithm will react to the change in amount of elements processed. Therefore O(n) and O(2n) are effectively equal.



              People are talking about O notation as growth rate here






              share|improve this answer












              Your algorithm is also O(1). You can think about complexity like how my algorithm will react to the change in amount of elements processed. Therefore O(n) and O(2n) are effectively equal.



              People are talking about O notation as growth rate here







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Aug 21 at 7:48









              amerykanin

              1251212




              1251212











              • Not O(1). Definitely. Unless there is an upper bound on input size.
                – user202729
                Aug 21 at 9:27
















              • Not O(1). Definitely. Unless there is an upper bound on input size.
                – user202729
                Aug 21 at 9:27















              Not O(1). Definitely. Unless there is an upper bound on input size.
              – user202729
              Aug 21 at 9:27




              Not O(1). Definitely. Unless there is an upper bound on input size.
              – user202729
              Aug 21 at 9:27










              up vote
              1
              down vote













              Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.






              share|improve this answer
























                up vote
                1
                down vote













                Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.






                share|improve this answer






















                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.






                  share|improve this answer












                  Your solution is could indeed be slower than the book's solution. Firstly, a hash lookup ideally has a constant time lookup. But, the retrieval of the object will not be if there are multiple hash collisions. Secondly, even if it is constant time lookup, there is usually significant overhead involved in executing the hash code function as compared to looking up an element in an array by index. That's why you may want to go with the array lookup. However, if you start to deal with non-ASCII Unicode characters, then you might not want to go with the array approach due to the significant amount of space overhead.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Aug 21 at 7:52









                  entpnerd

                  4,20211640




                  4,20211640




















                      up vote
                      0
                      down vote













                      The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k), while the array has a lookup complexity in O(1).



                      This sounds like your algorithm must be much worse. But in fact it is not, as k is bounded by 128 (else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1) as well with a bit bigger constants than the array lookup.



                      * assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n) resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.



                      This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n) for the converstion, O(log n) for the lookup) before its lookup time degenerates to O(n) because of too many collisions.






                      share|improve this answer






















                      • I moved this discussion into chat.
                        – allo
                        Aug 21 at 14:50















                      up vote
                      0
                      down vote













                      The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k), while the array has a lookup complexity in O(1).



                      This sounds like your algorithm must be much worse. But in fact it is not, as k is bounded by 128 (else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1) as well with a bit bigger constants than the array lookup.



                      * assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n) resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.



                      This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n) for the converstion, O(log n) for the lookup) before its lookup time degenerates to O(n) because of too many collisions.






                      share|improve this answer






















                      • I moved this discussion into chat.
                        – allo
                        Aug 21 at 14:50













                      up vote
                      0
                      down vote










                      up vote
                      0
                      down vote









                      The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k), while the array has a lookup complexity in O(1).



                      This sounds like your algorithm must be much worse. But in fact it is not, as k is bounded by 128 (else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1) as well with a bit bigger constants than the array lookup.



                      * assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n) resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.



                      This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n) for the converstion, O(log n) for the lookup) before its lookup time degenerates to O(n) because of too many collisions.






                      share|improve this answer














                      The bottleneck of your implementation is, that a set has a lookup (and insert) complexity* of O(log k), while the array has a lookup complexity in O(1).



                      This sounds like your algorithm must be much worse. But in fact it is not, as k is bounded by 128 (else the reference implementation would be wrong and produce a out-of-bounds error) and can be treated as a constant. This makes the set lookup O(1) as well with a bit bigger constants than the array lookup.



                      * assuming a sane implementation as tree or hashmap. The hashmap time complexity is in general not constant, as filling it up needs log(n) resize operations to avoid the increase of collisions which would lead to linear lookup time, see e.g. here and here for answers on stackoverflow.



                      This article even explains that java 8 by itself converts a hashmap to a binary tree (O(n log n) for the converstion, O(log n) for the lookup) before its lookup time degenerates to O(n) because of too many collisions.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Aug 21 at 14:39

























                      answered Aug 21 at 12:59









                      allo

                      1,58621727




                      1,58621727











                      • I moved this discussion into chat.
                        – allo
                        Aug 21 at 14:50

















                      • I moved this discussion into chat.
                        – allo
                        Aug 21 at 14:50
















                      I moved this discussion into chat.
                      – allo
                      Aug 21 at 14:50





                      I moved this discussion into chat.
                      – allo
                      Aug 21 at 14:50













                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51943836%2fcomparison-of-these-two-algorithms%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      這個網誌中的熱門文章

                      How to combine Bézier curves to a surface?

                      Mutual Information Always Non-negative

                      Why am i infinitely getting the same tweet with the Twitter Search API?