Abstract
A parallel algorithm based on MapReduce framework for finding hot spots from commodity reviews (PR-HD algorithm) is proposed. The PR-HD algorithm uses crawler technology to extract an electricity supplier. A review data set is generated from the review data of a popular mobile phone under the platform, and the weight of the feature words is calculated by the TF-IDF algorithm. The final weights of the feature words are obtained by adding position weights of the feature words, and a vector space model (VSM) calculation is established. The similarity of different comment sentences is combined using Canopy algorithm and K-means algorithm to realize hot spot discovery from commodity reviews. This allows product developers to obtain more direct and effective suggestions and feedback.
Abstract
A parallel algorithm based on MapReduce framework for finding hot spots from commodity reviews (PR-HD algorithm) is proposed. The PR-HD algorithm uses crawler technology to extract an electricity supplier. A review data set is generated from the review data of a popular mobile phone under the platform, and the weight of the feature words is calculated by the TF-IDF algorithm. The final weights of the feature words are obtained by adding position weights of the feature words, and a vector space model (VSM) calculation is established. The similarity of different comment sentences is combined using Canopy algorithm and K-means algorithm to realize hot spot discovery from commodity reviews. This allows product developers to obtain more direct and effective suggestions and feedback.