||[Mar. 30th, 2009|10:16 pm]
So my boss is interested in some stats on our data archive.|
I have access to a SQL database that has details of every single download transaction. So I can definitely pull some stats together. The question is... what?
I mean, I can slice and dice the data in a million different ways. I can count number of users, total data volume downloaded, popular files... what else would be interesting?
Jerry suggested looking at number of files downloads per user, which might show some interesting patterns.
I am probably too late on this, but some offhand thoughts:
You want your data aggregation to construct and/or support a narrative. Perception is better when there is a narrative with some dramatic tension. So you need to decide how the people who will want to give your group funding will want to see things, and what dramatic tension they're interested in.
The tension they're interested in is probably some kind of buzzword or popular political movement that makes them feel like you're working on exactly what they've been talking about.
You group users and your data into "characters" which interact with each other and the dramatic tension in ways that advance the narrative.
Here's an arbitrary and fictional example:
Ten years ago there was a gap in atmospheric modeling which failed to factor in some technologies which were emerging in transportation and the energy grid. This caused people to be uncertain of how much or how little adoption or advancement of these technologies would benefit things in the long term, even though they seemed promising in the short term.
So people who work for or think about utilities and transportation began seeking the kind of data that would allow their models to include these kinds of changes. Thus, five years ago you see a spike in these people doing research. Over time, they began to refine their queries to a specific area of data which turned out to be incredibly pertinent, and allowed them to construct the models which directed funding and interest into long-term solutions and abandoned those which only appeared to have an effect on the problem.
Recently, you can see a similar trend with data regarding clean coal, and similar patterns are emerging, which we can expect to be repeated as long as there are new technologies for which measuring the long-term impact requires large amounts of data which can be analyzed in specific ways--exactly what you provide.
That kind of thing. But what you actually provide is a cross-section of the interests and/or employers of users combined with the data sets they tend to use. If, you know, that's the narrative and tension you've come up with.
Dear god that was longer than I expected. Sorry. Hope it's at least vaguely useful.