Mastering Social Media Mining with Python by 2016
Author:2016
Language: eng
Format: epub, mobi
Publisher: Packt Publishing
Visualizing posts as a word cloud
After analyzing interactions, we move our attention back to the content of the posts.
Word clouds, also called tag clouds (https://en.wikipedia.org/wiki/Tag_cloud), are visual representations of textual data. The importance of each word is usually represented by its size in the image.
In this section, we will use the wordcloud Python package, which provides an extremely easy way to produce word clouds. Firstly, we need to install the library and its dependency (an imaging library) in our virtual environment using the following commands:
$ pip install wordcloud $ pip install Pillow
Pillow is a fork of the old Python Imaging Library (PIL) project, as PIL has apparently been discontinued. Among its features, Pillow supports Python 3, so after this brief installation, we're good to go.
The following script reads a .jsonl file as the one produced to store the posts from PacktPub, and creates a .png file with the word cloud:
# Chap04/facebook_posts_wordcloud.py import os import json from argparse import ArgumentParser import matplotlib.pyplot as plt from nltk.corpus import stopwords from wordcloud import WordCloud def get_parser(): parser = ArgumentParser() parser.add_argument('--page') return parser if __name__ == '__main__': parser = get_parser() args = parser.parse_args() fname = "posts_{}.jsonl".format(args.page) all_posts = [] with open(fname) as f: for line in f: post = json.loads(line) all_posts.append(post.get('message', '')) text = ' '.join(all_posts) stop_list = ['save', 'free', 'today', 'get', 'title', 'titles', 'bit', 'ly'] stop_list.extend(stopwords.words('english')) wordcloud = WordCloud(stopwords=stop_list).generate(text) plt.imshow(wordcloud) plt.axis("off") image_fname = 'wordcloud_{}.png'.format(args.page) plt.savefig(image_fname)
As usual, the script uses an instance of ArgumentParser to get the command-line parameter (the Page name or Page ID).
The script creates a list, all_posts, with the textual message of each post. We use post.get('message', '') instead of accessing the dictionary directly, as the message key might not be present in every post (for example, in the case of images without comment), even though this event is quite rare.
The list of posts is then concatenated into a single string, text, which will be the main input to generate the word cloud. The WordCloud object takes some optional parameters to define some aspects of the word cloud. In particular, the example uses the stopwords argument to define a list of words that will be removed from the word cloud. The words that we include in this list are the standard English stop words as defined in the Natural Language Toolkit (NLTK) library, as well as a few custom keywords that are often used in the PacktPub account but that do not really carry interesting meaning (for example, links to bit.ly and references to offers for particular titles).
An example of the output image is shown in the following Figure 4.9:
Download
Mastering Social Media Mining with Python by 2016.mobi
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Hello! Python by Anthony Briggs(9912)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9795)
The Mikado Method by Ola Ellnestam Daniel Brolund(9777)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8293)
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7775)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7760)
Grails in Action by Glen Smith Peter Ledbrook(7693)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7557)
Windows APT Warfare by Sheng-Hao Ma(6806)
Layered Design for Ruby on Rails Applications by Vladimir Dementyev(6531)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6409)
Blueprints Visual Scripting for Unreal Engine 5 - Third Edition by Marcos Romero & Brenden Sewell(6401)
Kotlin in Action by Dmitry Jemerov(5062)
Hands-On Full-Stack Web Development with GraphQL and React by Sebastian Grebe(4316)
Functional Programming in JavaScript by Mantyla Dan(4037)
Solidity Programming Essentials by Ritesh Modi(3985)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3774)
Unity 3D Game Development by Anthony Davis & Travis Baptiste & Russell Craig & Ryan Stunkel(3717)
The Ultimate iOS Interview Playbook by Avi Tsadok(3694)
