i trying highlighting right using apache solr. in case of partial match, want highlight matching part of word. whole word (which partially matched search term) highlighted instead.
example:
search "adida shi", should yield 2 items, 1 name 'adidas shirts' , other 'adidas red shirts'
/select?q=name:adida+shi&hl=true&hl.fl=name&qt=standardwt=json
expected highlighting:
<em>adida</em>s <em>shi</em>rts <em>adida</em>s red <em>shi</em>rts
actual highlighting:
<em>adidas</em> <em>shirts</em> <em>adidas</em> red <em>shirts</em>
the field used highlighting defined in schema.xml:
<field name="name" type="autocomplete_text" indexed="true" stored="true"/>
the field type field looks this:
<fieldtype name="autocomplete_text" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <tokenizer class="solr.whitespacetokenizerfactory"/> <filter class="solr.lowercasefilterfactory" /> <filter class="solr.edgengramfilterfactory" mingramsize="1" maxgramsize="25" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.whitespacetokenizerfactory"/> <filter class="solr.lowercasefilterfactory" /> </analyzer> </fieldtype>
i don't have specific configuration highlighting in core config file.
i using solr v6.0.1. highlighting working expected solr v4.10.4 same configuration. went through following sections of solr wiki , tried various highlighting parameters couldn't make work:
https://cwiki.apache.org/confluence/display/solr/highlighting https://cwiki.apache.org/confluence/display/solr/standard+highlighter
any ideas how make work?
adding answer follow previous comments.
the issue caused edgengramfilterfactory not working expected , reports instead incorrect offsets when generating tokens. such issue has been reopened in jira several times in past few versions of solr.
i solved in production setting lucenematchversion="4.5" (or whatever version working you, ngramfilterfactory.
i've got solution within jira comment can't find apologize not able add reference.
Comments
Post a Comment