Solr查询调优一: query VS filterquery 区别

Solr有两个查询参数,分别是query(q)和filterquery(fq)。官方文档没有写清楚两者之间具体有什么区别。

fq的官方文档这样写着:https://lucene.apache.org/solr/guide/7_3/common-query-parameters.html#fq-filter-query-parameter

The fq parameter defines a query that can be used to restrict the superset of documents that can be returned, without influencing score. It can be very useful for speeding up complex queries, since the queries specified with fq are cached independently of the main query. When a later query uses the same filter, there’s a cache hit, and filter results are returned quickly from the cache.
When using the fq parameter, keep in mind the following:

  • The fq parameter can be specified multiple times in a query. Documents will only be included in the result if they are in the intersection of the document sets resulting from each instance of the parameter. In the example below, only documents which have a popularity greater then 10 and have a section of 0 will match.fq=popularity:[10 TO *]&fq=section:0
  • Filter queries can involve complicated Boolean queries. The above example could also be written as a single fq with two mandatory clauses like so:fq=+popularity:[10 TO *] +section:0
  • The document sets from each filter query are cached independently. Thus, concerning the previous examples: use a single fq containing two mandatory clauses if those clauses appear together often, and use two separate fq parameters if they are relatively independent. (To learn about tuning cache sizes and making sure a filter cache actually exists, see The Well-Configured Solr Instance.)
  • It is also possible to use filter(condition) syntax inside the fq to cache clauses individually and – among other things – to achieve union of cached filter queries.
  • As with all parameters: special characters in an URL need to be properly escaped and encoded as hex values. Online tools are available to help you with URL-encoding. For example: http://meyerweb.com/eric/tools/dencoder/.

fq和q虽然不太好区分,但是能明确区分出两者的差别,对性能提升很高。两者的主要区别如下:
1、q又叫main query,fq全程filter query;
2、相关性评分
fq只有一个用途:就是查询出满足条件的文档。q有两个用途:1、查询出满足条件的文档;2、对返回的文档针对搜索关键字进行相关性评分。因此可以这样使用两者:将q看成一个特殊的filter,仅会多一步相关性评分。所以可以将用户搜索的关键词放入q中,这样可以根据用户的搜索给出相关性最高的文档,例如keyword=apache solr,同时将用户下拉选择的枚举字段放入fq参数中,例如category=techonology。
3、缓存和执行速度
将filter query 从main query中分离出来,有两个目的:
1、filter query 可以使用 filter query cache。
2、filter query 不进行开销巨大的相关性评分,加快执行速度。
4、可以指定多fq,但是只能有一个q
5、执行顺序
到底是fq先执行,还是q执行,看了很多文档,各执一词。但是solr in action的答案比较靠谱,执行顺序还是要看具体情况。

1 、每一个fq参数都会首先到filter cache中查询文档是否存在。
2、如果fq参数没有在 filter cache 找到,就会检索索引文件,并将检索到docset放入缓存中。
3、所有filter的docset进行取交集,最终生成一个唯一的docset。
4 、The q parameter is passed in (along with the filter DocSet) to be executed as a
Lucene query. When executing the query, Lucene plays leapfrog between the
query and combined filters, advancing both the query and filter results objects
to their next present internal ID (an integer). When both the query result and
filter result objects contain the same ID, that ID is collected, a process that
includes generating the relevancy score for the document
这段我翻译的不太清楚。意思大概是将q查出来的结果和前面filter的结果进行交集,最后为交集的每一个结果计算相关性评分。
5、执行post filter

参考资料:
1、solr in action

发表评论

电子邮件地址不会被公开。 必填项已用*标注