This post was originally posted on Medium.

News of the NSA’s programs motivates us to think harder about the information revealed by our digital activities. So what might the NSA see in my data? Better yet, what can I see?

Inspired by Stephen Wolfram’s post on personal analytics, I am trying to track and visualize more of my data. In May, I looked at my digital footprint on Twitter, Facebook, as well as my iPhone geolocation data from OpenPaths.The data I really wanted though was my email.

Immersion, an MIT Media Lab project, visualizes your Gmail as a network of people rather than a chronological sequence of text. To build this visualization, Immersion extracts all of your email headers. The team also makes your data available to download (thank you!).

Selecting only the emails I sent, and merging the data set with tweets I posted, I created the follow graphic. The plot is dense and reveals a lot about my habits and patterns over time.

In the fall of ‘10 you can see that I slept from 14:00 to 21:00. I was in China, 12 hours ahead of NYC, so that was actually 02:00 to 09:00. You can see the earlier nights in both the ‘11 and ‘12 summers when I was interning. Off cycle email times also show when I left the country in May ‘ 12, January ‘13, April ‘13, and May’13.

Sleep Habits

Drilling in further you can see changes in habits. You can see how my nights got later at the end of the semester with finals. You can also see the drop in traffic during the middle of March while on spring break.

Looking at Junior year, you can see that from October ‘11 to April ‘12 I was regularly awake well after 3am. I didn’t want to admit it at the time, but reflecting on the graphic I can acknowledge that the poor sleep pattern was detrimental to my academic performance. I wonder what correlations the NSA could give us between sleep and performance?

You can see the buildup and drop off in email traffic at the beginning of 2012 as I was planning TEDxYale.In ‘12, like ‘13, you can also see a spring break drop off in the middle of March.

Personal Productivity

The metadata is also useful for personal productivity. Looking at the distribution of my sent emails, it’s striking how email pervades almost every hour of the day.

When do I have class? Sadly, it’s hard to tell, for I am not as disciplined as I should be and don’t ignore email. Activity does spike on Monday and Wednesday around midday, which was an hour break I had between 10am-5pm class. Interestingly, the patterns have continued, and if you want to get a response from me, midday on Wednesday is still my peak email.

I could probably increase productivity by setting a regular email cadence, and only responding to emails at the beginning, middle and end of the day.

While I didn’t cover it here, looking at my Facebook data reveals that Sunday and Monday evening are my least productive times of the week (as measured by an increase in using Facebook).

Plotting my email activity by week over time, a fascinating pattern of peaks and troughs emerges. It appears that I will send lots of emails one week, and a lot less for the following 3-6 weeks, before surging again. This is the type of pattern I would like to see in my daily emails, but it is probably not good week to week.

Closing Thoughts

As we live more of our lives on the web, we develop a rich personal history hidden in our metadata. This data is valuable for us, the individuals creating it, not just the NSA. Visualizing it at scale enables us to self-reflect and fully internalize our habits.We need more applications like Immersion that help us see these patterns.

Your Data

I encourage you to try the above visualizations on your own data. Below are the steps I took to generate the graphs:

  1. Download your Gmail header data from Increment n from 0 by 1 until you get a 404. Save each page as n.json. I had 5 pages and I think there are roughly 10k emails per page.
  2. Use this Ruby script to combine the set of .json files into a single CSV.
  3. Use this R script to clean up the time stamps and plot the graphs.

Last September, Cleo took me skydiving. This year, she took me paragliding in Turkey! A few weeks before, I also went bungee jumping in New Zealand with Paul. Up next … wing suits? Karate skydiving?

As a final project for Yale Law Tech I set out to discover how much of my digital data footprint I could access, and if I could find anything interesting in the noise. I looked at the two primary web services I use - Twitter & Facebook - as well as all of my phone Geolocation data.


Late last year Twitter built the functionality that allow user’s to download their archive. When you do, you get a both a set of JSON files and a CSV. The CSV is really easy to read into R, and also very easy to use with excel. In R, with a little time stamp conversions, I was able to plot a set of visuals that take the almost two thousand tweets and try to bring sense to the noise.

The fall after I started using Twitter, I tweeted way too much. Thankfully, that activity has diminished over time. What this doesn’t show though is how much I use Twitter as a source of news and information. I would really like Twitter to offer the ability to access data on all my session times. This would be a fascinating set of information that would allow me to discover both my reading and procrastination habits and potentially give me clear data backing up intuitions I have about my productivity (or lack thereof).

I clearly need to get a little more sleep! And 4pm Monday and Thursday are clearly not my most productive hours of the week.

You can probably guess that I go to Yale, have a fascination for TED talks, and generally like ‘awesome’ things.


Facebook allows you to download an expanded archive of your data. However, while it is a good amount of data, it is tiny in comparison to the true size of my digital footprint on Facebook. When you download the archive, you get a set of html files that you can load locally, basically giving you a little static Facebook with personal information. The most interesting piece of information they make accessible is a subset of your account activity that includes session timestamps for the past few months. They provided me with sessions back to February 10th, 2013. It takes a little work to strip out all the html and get just the timestamps. It can be done though and the code is below. When we do, like Twitter, we can try and visualize some of my habits.

The Facebook heatmap is made with less data than Twitter, but offers a useful and different perspective. The Twitter data is my public actions, where as the Facebook session timestamps is my private viewing habits. Sunday is clearly a terribly unproductive day. Monday evening around 10pm I also start to loose concentration. Good habits to know! It would be much better if I could actually map this over time, with much more data so that I can get a more accurate representation of my habits.

Open Paths

I downloaded Open Paths after watching Jer Thorp’s TED Talk Making Data More Human over a year ago. The app has been running in the background tracking my location since March 8th, 2012. With over a year of data now, I thought I would finally turn the matrix of Lat Lon coordinates into a map.

If you put the Open Paths data against some of the Facebook data, you could quickly figure out that the northeast corridor trips are to visit a special someone in New York. With the Tweet’s that say Yale, it’s not hard to figure out how I spend quite a bit of my time - commuting back and forth.

Final Notes

Twitter definitely wins my vote for most accessible personal data. However, I am glad that Facebook gives me session times and I would really like those for Twitter. In the future, I want to figure out how to get timestamps for all my emails since that is a private and constant method of communication that I think will highlight lots of my personal habits. I also think it would be fascinating to map out my banking transactions over time since all of my credit transactions are recorded and easily exportable from Bank of America. On that note, I leave you with a link to Stephen Wolfram’s post on personal analytics that inspired a lot of this.

Here is all the R code I wrote to generate the visuals