Evaluating large-scale text mining applications beyond the traditional numeric performance measures