lucene – Elasticsearch: exclude word from wildcard search but without excluding the complete document

Currently I’m facing a seemingly simple job, that doesn’t seem to be so simple…

While trying to query documents with elastic search, I want to prevent certain words from being a query-hit BUT not exclude the whole document if other words in the query do hit.

So for example

I’m searching on house.*, but the word houseplant doesn’t need to hit, but if that word and housemate are in the document I still need to see the document.

we’ve already tried the following

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "documents",
            "inner_hits": {
              "size": 100,
              "highlight": {
                "fields": {
                  "documents.content": {
                    "number_of_fragments": 0
                  }
                }
              }
            },
            "query": {
              "bool": {
                "should": [
                  {
                    "regexp": {
                      "documents.content": "house.*&~(houseplant)"
                    }
                  },
                  {
                    "query_string": {
                      "default_field": "documents.content",
                      "fuzziness": 0,
                      "query": "room villa social"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

except houseplant will still be highlighted as a hit.
and a lot of other options exclude the entire document when trying to exclude houseplant from the wildcard search. So any documents that do include housemate would be excluded because they also contained houseplant.

I still want to see those, since they would still be relevant because of the other hit.

Source link