so I was just minding my own business working on a post archiver when I figured I’d pull down my list of followers.
if my hunch is correct, then it appears the longer you have a tumblr, the more FLAGRANTLY WRONG your reported follower count is.
come with me on a journey of software incompetence.
Tumblr says that I have 474 followers:
however, a weird thing happened when I started pulling them down via the Tumblr API
(an API is a special “side door” that companies like Tumblr, Facebook, etc., can create for programmers to build apps using their functionality. they provide documentation and access instructions, and programmers can follow them to “use” Tumblr through a programming language instead of the official apps and websites.)
I wouldn’t have noticed the weird thing, except I’m rather fastidious about data integrity, so I had my database configured with something called a unique primary key constraint, which means it will stop everything and scream loudly if there was any unexpected duplication of data.
(most web programmers follow the convention of whatever toolset they’re using; for example, the wildly popular web framework, Ruby on Rails, runs with no data integrity checks whatsoever.)
APIs are organized into pieces called endpoints, or methods. you can see in the docs for the “followers” endpoint that you can only request 20 at a time.
strangely, I was receiving the same followers multiple times! I updated my code to detect duplicate followers, print a message, and skip them. when I ran it again, I saw that I was receiving so many duplicates that sometimes I would receive an entire page of them.
once the process was finished, I connected directly to my database and queried the amount of (unique) followers it had saved:
“that’s crazy”, I thought. “no way is Tumblr reporting over 100 nonexistant followers. that’s a completely egregious error on one of the core features of the site!”
so I stubbornly went to Tumblr and manually paged through my list of followers to see if they matched up by querying my database on the side and comparing the lists:
that’s Page 1. if you look at the query I used:
select name from followers order by rowid limit 40 offset 0;
“limit” is how many followers to view, and “offset” is how many followers to skip. indeed, there were 40 followers on the first page, mostly from the burst of attention my blog got when my “She and Her Cat: Everything Flows” post went viral.
I assumed that I would be able to just scroll down, verify that everything matched, hit “Next Page” on Tumblr, and add 40 to my offset. that is, I assumed that Tumblr would show me 40 followers per page. instead, these are the limits and offsets I had to use to exactly duplicate what Tumblr showed me:
- 40 (0)
- 40 (40)
- 40 (80)
- 40 (120)
- 24 (160)
- 21 (184)
- 9 (205)
- 25 (214)
- 23 (239)
- 32 (262)
- 35 (294)
- 32 (329)
that’s 12 pages of followers. but only the first four pages displayed 40. and if you add up all of those limits (the non-parenthesized numbers, that is, how many followers I saw on each page), you come up with… you guessed it:
in other words, Tumblr displays my follower account as 474, but it only showed me 361 followers.
So… what’s going on?!?
Well, we can make a few observations about the follower counts per page displayed on Tumblr itself:
- we got four pages of 40 followers each, then things started getting dicey.
- the maximum followers per page was 40
- the minimum was 9
additionally, it’s worth noting that I received a burst of a little over 100 new followers as a result of that viral post.
so right now, my hunch is this: when an account following you is deleted (whether because it turned out to be spam or because it was removed by the owner), it’s removed from your follower list, but your follower count never decreases.
I’ll talk about why things would work this way in a minute, but first, let’s run an experiment to test my theory. this won’t prove anything for sure, since the behavior might change based on timespan (minutes vs. weeks), but let’s give it a shot anyway.
first, I used an Incognito Window to create a new account so I can easily swap back and forth between them.
second, I doublechecked my currently-reported follower count, which is now 475, since I got a follower in the time it’s taken to write this post
(coincidentally that follower is @yourhighnessisspeaking, who appears to be a databasey-programmery-type… hi there you! unlike the droves of hapless folks who followed me based on the anime post, this one is hopefully right up your alley!)
third, I followed this blog with my new one.
fourth, I verified that my follower count increased and the new blog appears in my list of reported followers:
fifth, I deleted my new blog:
sixth, I refreshed my follower page:
my follower count is still reported as 476, but the deleted blog is no longer shown in my list.
the main reason is probably due to a programming concept called caching.
when you program a computer to “cache” something, you are coding it to save the simple results of a complex operation. followers are a perfect example: to get the simple number that represents my follower account, a computer has to comb through every single on of my followers.
in real world terms, a “cache” is a hidden collection of things. computer caches are not completely dissimilar: users are not supposed to know that a cache exists. so why were we able to notice this one?
well, look at it this way. like humans, computers only have a certain amount of spoons (though programmers use terms like cycles or bandwidth).
let’s say you are Tumblr’s servers.
every day, you have 500,000,000 spoons to work with.
it takes 1 spoon to count 1 follower.
Tumblr has something upwards of 10 million users. let’s say half of them view their profile pages every day, with an average of 100 followers each.
that’s 500 million spoons, JUST from half of the user base loading their profile page ONCE.
what if, instead of calculating your amount of followers manually every single time, they cached it, and simply added 1 to the cached number every time someone follows you?
now, instead of taking 100 spoons every time someone hits “refresh”, it takes 0 spoons. you only have to spend 1 spoon to increment someone’s cached count when someone actually follows them.
“but Max”, you might say, “can’t you also just program it to DECREASE the follower count when someone’s blog is deleted?”
you certainly could.
“but Max”, you might ask, “that doesn’t sound very complex. why doesn’t Tumblr do that?”
well, that conclude’s today’s adventure in web development chicanery. I hope it was enjoyable, even or especially for those of you who don’t know much about computer. let me know what you think, especially if you’d be interested in reading more “DEEP DIVES” like this!
I potential theory for this is: I have followed several people around five times according to Tumblr. And I don’t mean followed and unfollowed and refollowed. I mean that if you go to my page listing all the people I’m following, there will be several users who appear anywhere between two and five times IN A ROW on my following page and unfollowing any of those duplicates unfollows all of them but refollowing re follows multiple as well.
TLDR: Tumblr is busted as shit and no one is surprised.
tumblr also wont decrease your follower count when you block someone (so all those porn bots that followed you still show up in your numbers)
I don’t think that’s true, because I definitely lose followers when I block porn bots that follow me (which I do regularly).
Oh hey, that makes sense! The caching only updates when Tumblr actually needs to care about the follower list. When you’re listing your followers, a cache is good enough. When you’re removing, a cache is good enough. Know what isn’t good enough?
When you make a post and have to actually leave the nodes in your followers’ histories. From a low level communication network standpoint, that’s the only reason, because you have to hit all those histories anyways, so now is the time to update.
So remember that paradox about when you post an update to a rarely-updated blog, and you lose a dozen followers? That’s just the cache updating.