NORMA eResearch @NCI Library

Scalable data analytics using crowdsourced repositories and streams

Veloso, Bruno, Leal, Fátima, González-Vélez, Horacio, Malheiro, Benedita and Burguillo, Juan Carlos (2018) Scalable data analytics using crowdsourced repositories and streams. Journal of Parallel and Distributed Computing, 122. pp. 1-10. ISSN 0743-7315

Full text not available from this repository.
Official URL: http://dx.doi.org/10.1016/j.jpdc.2018.06.013

Abstract

The scalable analysis of crowdsourced data repositories and streams has quickly become a critical experimental asset in multiple fields. It enables the systematic aggregation of otherwise disperse data sources and their efficient processing using significant amounts of computational resources. However, the considerable amount of crowdsourced social data and the numerous criteria to observe can limit analytical off-line and on-line processing due to the intrinsic computational complexity. This paper demonstrates the efficient parallelisation of profiling and recommendation algorithms using tourism crowdsourced data repositories and streams. Using the Yelp data set for restaurants, we have explored two different profiling approaches: entity-based and feature-based using ratings, comments, and location. Concerning recommendation, we use a collaborative recommendation filter employing singular value decomposition with stochastic gradient descent (SVD-SGD). To accurately compute the final recommendations, we have applied post-recommendation filters based on venue suitability, value for money, and sentiment. Additionally, we have built a social graph for enrichment. Our master–worker implementation shows super-linear scalability for 10, 20, 30, 40, 50, and 60 concurrent instances.

Item Type: Article
Subjects: Q Science > QA Mathematics > Electronic computers. Computer science
T Technology > T Technology (General) > Information Technology > Electronic computers. Computer science
H Social Sciences > HD Industries. Land use. Labor > Specific Industries > Tourism Industry
Divisions: School of Computing > Staff Research and Publications
Depositing User: Caoimhe Ní Mhaicín
Date Deposited: 10 Jul 2018 14:39
Last Modified: 10 Jul 2018 15:53
URI: https://norma.ncirl.ie/id/eprint/3069

Actions (login required)

View Item View Item