Data Visualization and Python

Data visualization is fed by data models and used to make decisions in the data science process

Data visualization first requires data collection and cleaning, which can be streamlined using Python. Source: Farcaster at English Wikipedia.

A PyMNtos Special Event Starring Brian Lehman

How does Twitter use Python to generate social media data visualizations? Twitter data scientist Brian Lehman took a break from the Minneapolis eyeo festival to address a PyMNtos special event hosted by SPS Commerce last night. Part of his talk focused on the development process, using the “Languages of Twitter” graphic as the example, published in March 2014 in the New York Times Bits blog when Lehman was working for Gnip, a social data analytics firm. The following month, Twitter announced it had purchased Gnip for $134M.

Data Visualization with a Marketing Tie-In

The marketing department wanted to visually represent the value proposition of Gnip, and Gnip’s business is visual storytelling around social media. With access to Twitter’s raw data stream, Gnip’s data scientists went back and forth with marketing, sampling data and creating graphics to support a compelling tagline. The final product does not require a tagline, however, since the worldwide adoption of Twitter implicit in the colorful growth lines fulfills the marketing requirement.

Instead, upon inspection, the graphic generates questions. What caused the spike in Arabic-language users in 2010? Why does the Chinese language line appear to dip? What else can we learn from Twitter data? Lehman and his colleagues approach data from a learning perspective, rather than marketing per se, choosing a graphic that invites viewers to refine their question-asking technique and demonstrating a central function of data science. Effective collaboration with marketing, the client and project owner, is nonetheless implicit.

Data Science Tactics

Lehman provided data analysis and graphics code examples of his work, written in open-source languages, including R, Python, and D3. He cautions that choosing the appropriate tool for each step takes self-control; exploring the latest tools is not necessarily the best way to move a project forward. In the examples, Python parsed the raw data and added a practical date format for grouping, R provided the final data tables and draft graphics, and D3 created the final HTML-based graphic.

The creative process wrapping around data visualization is iterative; data can tell stories that no one wants to hear, leading to new marketing hypotheses. For those who didn’t catch all the details, the example code is on Github, and Lehman can be reached on Twitter, naturally: @BRIANLEHMAN. Stay tuned to the Brownie Group blog for more Python-based data visualization strategies in an upcoming post.