Papers
arxiv:1901.00439

Deep Representation Learning for Clustering of Health Tweets

Published on Dec 25, 2018

Abstract

Twitter has been a prominent social media platform for mining population-level health data and accurate clustering of health-related tweets into topics is important for extracting relevant health insights. In this work, we propose deep convolutional autoencoders for learning compact representations of health-related tweets, further to be employed in clustering. We compare our method to several conventional tweet representation methods including bag-of-words, term frequency-inverse document frequency, Latent Dirichlet Allocation and Non-negative Matrix Factorization with 3 different clustering algorithms. Our results show that the clustering performance using proposed representation learning scheme significantly outperforms that of conventional methods for all experiments of different number of clusters. In addition, we propose a constraint on the learned representations during the neural network training in order to further enhance the clustering performance. All in all, this study introduces utilization of deep neural network-based architectures, i.e., deep convolutional autoencoders, for learning informative representations of health-related tweets.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1901.00439 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1901.00439 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1901.00439 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.