python – Regex that removes whitespaces between two specific characters
In pyspark I have the following expression
df.withColumn('new_descriptions',lower(regexp_replace('descriptions',r"\t+",'')))
Which basically removes tab characters and makes my descriptions columns become lower
Here is a list samples of my descriptions columns
['banha frimesa 450 gr','manteiga com sal tourinho pote 200 g','acucar refinado caravelas pacote 1kg',
'acucar refinado light uniao fit pacote 500g','farinha de trigo especial 101 5kg']
What I want to do is to be able to remove the whitespaces that are between the value and it is unit.
For example in this guy banha frimesa 450 gr, I want it to become banha frimesa 450gr.
But I also need to avoid removing whitespaces that are between a digit and digit with unit.
For example, this guy farinha de trigo especial 101 5kg** should stay the same.
What kind of regex should I use to only remove the whitespace that are between the kg,ml,l,g unit and it is value?
Wanted Result:
['banha frimesa 450gr','manteiga com sal tourinho pote 200g','acucar refinado caravelas pacote 1kg',
'acucar refinado light uniao fit pacote 500g','farinha de trigo especial 101 5kg']
Read more here: Source link
