Text this: Statistical data preprocessing methods in distance functions to enhance k-means clustering algorithm