Home > Computers & Technology > Programming Languages

Mastering Social Media Mining with Python by 2016

Author:2016 , Date: September 16, 2017 ,Views: 204

Mastering Social Media Mining with Python by 2016

Author:2016
Language: eng
Format: epub, mobi
Publisher: Packt Publishing

Visualizing posts as a word cloud

After analyzing interactions, we move our attention back to the content of the posts.

Word clouds, also called tag clouds (https://en.wikipedia.org/wiki/Tag_cloud), are visual representations of textual data. The importance of each word is usually represented by its size in the image.

In this section, we will use the wordcloud Python package, which provides an extremely easy way to produce word clouds. Firstly, we need to install the library and its dependency (an imaging library) in our virtual environment using the following commands:

$ pip install wordcloud $ pip install Pillow

Pillow is a fork of the old Python Imaging Library (PIL) project, as PIL has apparently been discontinued. Among its features, Pillow supports Python 3, so after this brief installation, we're good to go.

The following script reads a .jsonl file as the one produced to store the posts from PacktPub, and creates a .png file with the word cloud:

# Chap04/facebook_posts_wordcloud.py import os import json from argparse import ArgumentParser import matplotlib.pyplot as plt from nltk.corpus import stopwords from wordcloud import WordCloud def get_parser(): parser = ArgumentParser() parser.add_argument('--page') return parser if __name__ == '__main__': parser = get_parser() args = parser.parse_args() fname = "posts_{}.jsonl".format(args.page) all_posts = [] with open(fname) as f: for line in f: post = json.loads(line) all_posts.append(post.get('message', '')) text = ' '.join(all_posts) stop_list = ['save', 'free', 'today', 'get', 'title', 'titles', 'bit', 'ly'] stop_list.extend(stopwords.words('english')) wordcloud = WordCloud(stopwords=stop_list).generate(text) plt.imshow(wordcloud) plt.axis("off") image_fname = 'wordcloud_{}.png'.format(args.page) plt.savefig(image_fname)

As usual, the script uses an instance of ArgumentParser to get the command-line parameter (the Page name or Page ID).

The script creates a list, all_posts, with the textual message of each post. We use post.get('message', '') instead of accessing the dictionary directly, as the message key might not be present in every post (for example, in the case of images without comment), even though this event is quite rare.

The list of posts is then concatenated into a single string, text, which will be the main input to generate the word cloud. The WordCloud object takes some optional parameters to define some aspects of the word cloud. In particular, the example uses the stopwords argument to define a list of words that will be removed from the word cloud. The words that we include in this list are the standard English stop words as defined in the Natural Language Toolkit (NLTK) library, as well as a few custom keywords that are often used in the PacktPub account but that do not really carry interesting meaning (for example, links to bit.ly and references to offers for particular titles).

An example of the output image is shown in the following Figure 4.9:

Download

Mastering Social Media Mining with Python by 2016.epub
Mastering Social Media Mining with Python by 2016.mobi

Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.

Categories

Linux & Unix	iPhone & iOS
Macintosh	Android
Business Technology	Certification
Computer Science	Databases & Big Data
Digital Audio, Video & Photography	Games & Strategy Guides
Graphics & Design	Hardware & DIY
History & Culture	Internet & Social Media
Mobile Phones, Tablets & E-Readers	Networking & Cloud Computing
Operating Systems	Programming
Programming Languages	Security & Encryption
Software	Web Development & Design