Text

Starburst in R

> starburst('Matrices!')
M   M   M
 a  a  a 
  t t t  
   rrr   
Matrices!
   ccc   
  e e e  
 s  s  s 
!   !   !

Read More

Link

ScraperWiki’s first US two dayJournalism Data Camp¬†event in conjunction with the Tow Center for Digital Journalism at Columbia University and supported by the Knight Foundation on February 3rd and 4th 2012.

Photo
MARCH 16 UPDATE: My email scraping has become surprisingly controversial, so I’ve taken down the code and other plots for now. Ironically, I’ve also updated the plot.
I studied the emails sent to my dorm's email list and drew some plots. A little context should be enough for you to follow them.
Risley Hall is an arts-themed dorm at Cornell University for  undergraduates of all years. Everyone who lives in the dorm is on the  risleyhall-l mailing list. Until recently, anyone was allowed to send  emails to that. Last fall, the powers that were decided to turn  risleyhall-l into a moderated announcements list and to create an open  discussion list called squidserve-l, named after the Risley mascot.
I used Thunderbird to save the emails in plain text and then used  grep, sed and R to extract and plot information. The source code is here. Or clone the git repository.
The graph above shows daily activity over time. Activity has generally been increasing over the past three years. The highest-activity days were November 1, 2010, with 43 emails and March 9, 2011, with 42 emails, both of which were days when nonsensical mailing list policy was being discussed heavily on the mailing lists.
There are some consistent within-year activity patterns. Peaks of activity occur at the beginning of the year and at the end of October. Also, activity is lower from November to March, and there’s hardly any activity over breaks.
I’ll probably continue doodling this for a while as a break from less frivolous activities. I’ve just started charting the occurrence of different words (regular expressions actually) in  emails. Check back in a couple weeks and see what else I come up with.

MARCH 16 UPDATE: My email scraping has become surprisingly controversial, so I’ve taken down the code and other plots for now. Ironically, I’ve also updated the plot.

I studied the emails sent to my dorm's email list and drew some plots. A little context should be enough for you to follow them.

Risley Hall is an arts-themed dorm at Cornell University for undergraduates of all years. Everyone who lives in the dorm is on the risleyhall-l mailing list. Until recently, anyone was allowed to send emails to that. Last fall, the powers that were decided to turn risleyhall-l into a moderated announcements list and to create an open discussion list called squidserve-l, named after the Risley mascot.

I used Thunderbird to save the emails in plain text and then used grep, sed and R to extract and plot information. The source code is here. Or clone the git repository.

The graph above shows daily activity over time. Activity has generally been increasing over the past three years. The highest-activity days were November 1, 2010, with 43 emails and March 9, 2011, with 42 emails, both of which were days when nonsensical mailing list policy was being discussed heavily on the mailing lists.

There are some consistent within-year activity patterns. Peaks of activity occur at the beginning of the year and at the end of October. Also, activity is lower from November to March, and there’s hardly any activity over breaks.

I’ll probably continue doodling this for a while as a break from less frivolous activities. I’ve just started charting the occurrence of different words (regular expressions actually) in emails. Check back in a couple weeks and see what else I come up with.

Video

How many gifts did my true love give to me on all twelve nights of Christmas?

After seeing Information is Beautiful’s recent information animation, I decided that I’d make my own.

I used R to generate a PDF and then did a screencast of the PDF with ffmpeg to make a video with the appropriate timings. The code for the PDF is here.

Photo
How many of each gift did my true love give to me?

How many of each gift did my true love give to me?

Photo
Gifts my true love gave to me on all twelve days of Christmas

Gifts my true love gave to me on all twelve days of Christmas

Photo
I had been wondering what impact my friending 200 people from my  Gmail address book had, so I scraped the dates from the notification  emails.
The plot shows notifications of friend requests from other people to  me in black and confirmations of my requests to other people in red.  That sudden and sharp increase at the end of the graph is from when I  friended 200 people at once on May 17.
I also took the dates from all of the other types of notifications  that I have Facebook send to my email. I’ll have more cool results from  these data and from different transformations of the data. Interesting  trends should appear when I take derivatives.
I’m also thinking about other ways to stratify the data. I could plot  notifications by time of day or person. Maybe I should get the number  of pictures I was tagged in in addition to just whether I was tagged. I  should also plot total notifications of any sort, not stratified.

I had been wondering what impact my friending 200 people from my Gmail address book had, so I scraped the dates from the notification emails.

The plot shows notifications of friend requests from other people to me in black and confirmations of my requests to other people in red. That sudden and sharp increase at the end of the graph is from when I friended 200 people at once on May 17.

I also took the dates from all of the other types of notifications that I have Facebook send to my email. I’ll have more cool results from these data and from different transformations of the data. Interesting trends should appear when I take derivatives.

I’m also thinking about other ways to stratify the data. I could plot notifications by time of day or person. Maybe I should get the number of pictures I was tagged in in addition to just whether I was tagged. I should also plot total notifications of any sort, not stratified.