elasticsearch – Removing tokens from a tokenstream which do not contain any special characters

I am getting a Tokenstream that contains alphanumeric tokens as well as alphanumeric tokens with some special characters. I want my resultant Tokenstream to be free from any alphanumeric tokens, or simply Tokenstream containing tokens with alphanumeric and special character combinations. Can anyone please help me in achieving this?

Currently, I am getting the tokens as follows:

curl 'http://localhost:9200/test/_analyze?analyzer=special_analyzer&pretty' -d ':port&data'
{
  "tokens" : [
    {
      "token" : ":port&data",
      "start_offset" : 0,
      "end_offset" : 10,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : ":port&",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "port&data",
      "start_offset" : 1,
      "end_offset" : 10,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "port",
      "start_offset" : 1,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : ":port",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "&data",
      "start_offset" : 5,
      "end_offset" : 10,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "port&",
      "start_offset" : 1,
      "end_offset" : 6,
      "type" : "word",
      "position" : 0
    }
  ]
}

I want to remove the token ‘port’ as it does not contain any special character in it.

Read more here: Source link