Storing Time-Series Twitter Engagement Data in a Graph Database
I wanted to better understand how Tweets get traction, and how that contributes to people’s follower counts. So I wrote some Node.js code that connects to the Twitter API every so often, looks at the people I follow and their tweets, and saves the results into my favorite graph database software, Neo4j.
Editor’s Note: Neo4j is not paying me for this post, but they totally should.
It took some trial and error, but I’m pretty happy with the schema I’m using. Here it is.
Every time the data is fetched from the Twitter API, a new “Timestamp” node is created. That node has 2 connections from the User node and the Tweet node, and the actual stats about the user and the tweet are stored in those 2 connections. That’s the magic part, so I’ll reiterate: The time-series data is stored inside the connection, not the node.
(The files are insiiiiiiide the computer….)
Let me explain with pictures.
For this example let’s use one of my favorite high-profile data scientists, Max. Here you can see Max (in green) posted a tweet (in blue). But let’s look at those connections to the tan timestamp node below.
In the screenshot above, I’ve clicked on the connection between the tweet and the timestamp, so you can see the data it contains. You can see it’s got one favorite and no retweets.
Now, in real life I go ahead and like that tweet, run my script again, and let’s see what happens…
Holy shit it worked! There’s another timestamp node. And when you click the connection between that new timestamp and the tweet, you can see the new favorite count.
So that’s the basic technique. You can see that the same approach works for tracking follower growth too. For example:
There are more complicated ways of storing time-series data in graphs (Google “time trees”) but this simple approach seems to be working well for me, and will hopefully yield some interesting results.
So that’s it. That’s the post.
(PSSST: I’m having lots of fun with this nerdy graph stuff and will be posting more in the future. For updates you should follow me on Twitter, where you bet your ass I’m tracking my follower count using this technique.)