NIUCLOUD是一款SaaS管理后台框架多应用插件+云编译。上千名开发者、服务商正在积极拥抱开发者生态。欢迎开发者们免费入驻。一起助力发展! 广告
[[stopwords-relavance]] === Stopwords and Relevance The last topic to cover before moving on from stopwords((("stopwords", "relevance and")))((("relevance", "stopwords and"))) is that of relevance. Leaving stopwords in your index could make the relevance calculation less accurate, especially if your documents are very long. As we have already discussed in <<bm25-saturation>>, the((("BM25", "term frequency saturation"))) reason for this is that <<tfidf,term-frequency/inverse document frequency>> doesn't impose an upper limit on the impact of term frequency.((("Term Frequency/Inverse Document Frequency (TF/IDF) similarity algorithm", "stopwords and"))) Very common words may have a low weight because of inverse document frequency but, in long documents, the sheer number of occurrences of stopwords in a single document may lead to their weight being artificially boosted. You may want to consider using the <<bm25,Okapi BM25>> similarity on long fields that include stopwords instead of the default Lucene similarity.