This post was originally published on Medium

Uber has received a lot of negative press lately for invasions of user privacy. Employees supposedly had access to a ‘God View’ of every user (allowing near-unlimited access to the real time locations and personal details of its userbase), and considered exposing the trip histories of unfriendly journalists. The Uber Android app has been shown to collect all sorts of unnecessary data. Thankfully, Congress has spoken out, and Uber is putting stricter data controls in place.

Data control is critically important, but personal data freedom is also important and often overlooked. I have written before how as more of our daily lives are mediated by digital products, we create a rich personal history. This is especially true of social media sites like Facebook and Twitter, but also true of more “analog” products like NYC Citibike. As creators of this personal history, we should have open access to and creative freedoms with our own data. Tim Berners-Lee recently raised this issue saying that “the data we create about ourselves should be owned by each of us, not by the large companies that harvest it.”

Facebook, Twitter and Google now provide personal data exports, and Uber, along with many other services, should consider doing the same as part of new data policies.

We should consider the equivalent of a Creative Commons License for our personal data that every service would strive to adopt:

You are free to:

  1. Download — free access to your raw data in standard file formats

  2. Share — copy and redistribute the data in any medium or format

  3. Adapt — remix, transform, and build upon the data

Under the following terms:

  1. Attribution — You must provide a sign-up link to the application

Data freedom shouldn’t rely on the independent motivation and hard work of individuals like Josh Hunt and Chris Wong, who wrote code to scrape Uber.

With my raw Uber data I can create visualizations of my entire travel history

With my raw data, it is also then possible to analyze my data as I want e.g., total trips over time, and total spend over time

And I can also query the information for information I want:

  • Total Trips: 173 (9 cancels)
  • Total Distance: 883.35 miles (average trip 5.2 miles)
  • Total Time: 47hrs 45mins 43secs (average trip: 16mins 51secs)

As Congress works with companies like Facebook, Twitter, and Uber to shape user privacy rights, we should also push them to give us unfiltered access to and creative freedom with our own data. If Uber employees can run a SQL query on my personal data, I should be able to as well.

Uber leaked select city-by-city revenue data last week. The numbers were impressive: $26MM in top line revenue in NYC alone for December 2013.

But just how big is Uber relative to the NYC taxi market? Chris Wong FOILed all of NYC’s taxi trip data for 2013 which is now in Google Big Query for easy SQL querying to compare Uber’s usage relative to NYC taxis.

1. Uber NYC earned 18% as much as NYC taxis on Dec 31, 2013

  • Uber 2013: $1,118,271
  • Uber 2012: $182,819
  • Taxis: $6,272,548

2. Uber NYC earned 12% as much as NYC taxis in December 2013

  • Uber: ~$26MM
  • Taxis: $211,328,661.33

3. Uber NYC had 7% as many rides as NYC taxis on Dec 31, 2013

  • Uber 2013: 32,547
  • Uber 2012: 4,785
  • Taxis: 467,587

4. Uber NYC’s average ride was 256% the average NYC taxi fare on Dec 31, 2013

  • Uber NYE 2013: $34.36
  • Taxis NYE 2013: $13.41

NYC Taxi Data Queries In Google BigQuery

  1. December 2013 Total Revenue

    SELECT SUM(FLOAT(total_amount)) FROM [833682135931:nyctaxi.trip_fare] WHERE INTEGER(YEAR(TIMESTAMP(pickup_datetime))) = 2013 AND INTEGER(MONTH(TIMESTAMP(pickup_datetime))) = 12

  2. December 31, 2013 Total Revenue

    Same as above except add AND INTEGER(DAY(TIMESTAMP(pickup_datetime))) = 31

  3. December 31, 2013 Total Trips

    SELECT COUNT(*) then same as 2 above


Predictions are dangerous to make, but I’m going to try anyway: I think AI assistants will be one of the next great enterprise SaaS opportunities. Here are a few reasons why:

1. A large existing market to provide a cheaper substitute for

~$110BN+ of secretary and EA labor costs

  • 2.56MM secretaries and administrative assistants in the US making on average $34k per year for a total of $73.41BN in wages
  • 755,210 executive secretaries and executive administrative assistants making on average $51.87k for a total of $39.17BN

I don’t have evidence for it, but I intuit that scheduling is becoming an increasing challenge for more people than ever before:

  • Teams are increasingly distributed
  • More work is done on the go but scheduling is hard to do on mobile
  • More of the workforce engages in higher cognitive activities that require significant collaboration

2. Freemium product & market expansion

  • An artificial assistant would substitute for existing labor, but the lower price point will also expand the addressable population as the product is within the budget of more employees and it can increase their productivity

  • A free self-serve product may make it compelling for individuals to try having a personal assistant when the upfront cost used to make it prohibitive to even consider. Free power users can naturally gravitate into paid users based on utilization thresholds

3. The infrastructure is ready

Most modern companies are now on the Google Apps suite which is comparatively simple for 3rd party technology to integrate with and for user’s to authorize (see RelateIQ, Mailbox, Streak CRM etc.)

I haven’t diligently studied the latest advances, but anecdotes suggest that natural language processing is now (or soon will be) smart enough to parse and engage in conversation:

  • Siri and voice recognition has made significant advances, and text is much easier to parse than voice
  • IBM’s Watson parsing jeopardy questions (this does have a defined answer format though)
  • RelateIQ intelligently parses emails and suggests follow ups you have missed

4. Winner takes most market

This is pure conjecture but I think there is a compelling case this is a winner takes most market. My initial basis for this argument is that:

  • The breakout product will own a powerful distribution channel (signatures in email) generating the most efficient customer acquisition (similar to how Survey Monkey acquires customers through each survey)
  • This is a data product, and the early breakout will have the biggest head start on supervised learning which may enable a fundamentally better product
  • I think that bot-to-bot by the same company will end up with a superior experience to one where two different AIs are trying to compete (though that could be hilarious)


I think the leading company will differentiate through better technology and exceptional marketing. EAs can do a multitude of work, but the winner will start with a monopolistic focus on scheduling and expand from there as necessary.