java - Hashmap put performance varies based on keys -


i trying load around 5 million objects fetched db via hibernate hashmap. 2 types of classes (a & b). iterate through pojos. key field pojo , value pojo itself.

1. class type, key integer field. able load map in less 20 seconds.

for class b
2.a) test 1, key string field. when try load these objects new hashmap (fresh attempt restarting java process, no concern of gc yet), takes 30 seconds load 100k objects map.
2.b) test 2, when try use different field class (integer type) , load map, works 1st 1 , loads in less 20 seconds.
2.c)test 3, wondered if problem data type. class b, tried approach of creating string key using integer field in #2.b. (key = int_field + "") , loaded in < 20 seconds.

another test, test 4, did class type b way created key. 2.c, created key
map.put( pojo.getintfield() + "", pojo);
result mentioned above in 2.c

2.d) when created getter in pojo returned int_field + "" , used in map put as
map.put( pojo.getintfieldinstringform(), pojo);
performance deteriorated around 30 secs 100k objects.

i know problem keys because have verified db fetch phase adding result objects list , loads in < 20 secs both types.

i not able understand reason this. if can please shed light on this, helpful. appreciated. thanks

edited: adding code snippets here (forgive formatting/typos if any):
test #1

map<string, classa> map = new hashmap<string, classa>(); session session = sessionfactory.opennewsession(); try {     iterator<classa> iterator = session.createquery( "from classa" ).setfetchsize( 1000 ).iterate();     while ( iterator.hasnext() ) {         classb objclassa = iterator.next();         map.put( objclassb.getintfield(), objclassa );                   } } catch (exception e) {     e.printstacktrace(); } {     session.close(); } 


test #2.a

map<string, classb> map = new hashmap<string, classb>(); session session = sessionfactory.opennewsession(); try {     iterator<classb> iterator = session.createquery( "from classb" ).setfetchsize( 1000 ).iterate();     while ( iterator.hasnext() ) {         classb objclassb = iterator.next();         map.put( objclassb.getstringfield(), objclassb );                    } } catch (exception e) {     e.printstacktrace(); } {     session.close(); } 


test #2.b

map<integer, classb> map = new hashmap<integer, classb>(); session session = sessionfactory.opennewsession(); try {     iterator<classb> iterator = session.createquery( "from classb" ).setfetchsize( 1000 ).iterate();     while ( iterator.hasnext() ) {         classb objclassb = iterator.next();         map.put( objclassb.getintfield(), objclassb );                   } } catch (exception e) {     e.printstacktrace(); } {     session.close(); } 


test #2.c

map<string, classb> map = new hashmap<string, classb>(); session session = sessionfactory.opennewsession(); try {     iterator<classb> iterator = session.createquery( "from classb" ).setfetchsize( 1000 ).iterate();     while ( iterator.hasnext() ) {         classb objclassb = iterator.next();         map.put( objclassb.getintfield() + "", objclassb );                  } } catch (exception e) {     e.printstacktrace(); } {     session.close(); } 


test #2.d

map<string, classb> map = new hashmap<string, classb>(); session session = sessionfactory.opennewsession(); try {     iterator<classb> iterator = session.createquery( "from classb" ).setfetchsize( 1000 ).iterate();     while ( iterator.hasnext() ) {         classb objclassb = iterator.next();         map.put( objclassb.getintfieldinstringform() + "", objclassb );                  } } catch (exception e) {     e.printstacktrace(); } {     session.close(); } 

to put items in hashmap, hashcode of key needs calculated. if strings 8 - 10 chars, there calculation needs done map them onto 32 bit hashcodes. how large integer keys? if smaller 100.000, there's 5 chars calculate hashcode from, that's little bit faster.

you have performance hit when 2 keys calculate same hashcode, happen couple of times string keys.

when use unique integers keys, hash collisions never happen. , maybe if use strings converted integers, string hash algorithm has fewer collisions well.


Comments