Estimating G+ User Activity: 4-6 million active posters in January 2015 to date
Trying to sort out what the actual active user count on G+ is is something of a cottage industry. I've suggested the label "Plussologists" (from Kremlinologists) for those so engaged.
In his October, 2014 ReCode Interview, G+ bossman Dave Besbris rather pointedly said "I don't want to talk about numbers".
Why not, Dave? Hiding something?
This is an analysis which estimates active G+ users, defined as those who've made a post to G+, not simply commented on a YouTube video, in the month of January, 2015. It's based on pulling Google's on Profile sitemaps and sampling profile pages based on them. You should be able to replicate the process yourself (or with a hackishly-minded assistant) using the methods described.
Summary of findings:
- There are about 2.2 billion G+ profiles total.
- Of these, about 9% have any publicly-posted content.
- Of those, about 37% have as their most recent activity a YouTube comment, another 8% profile photo changes (45% of all "active" profiles).
- Only 6% of profiles which have ever been publicly active have any post activity in 2015 (18 days so far).
- Only around half of those, 3% of active profiles, are not YouTube comments.
- That is, 0.3% of all G+ profiles, about 6.6 million users, have made public G+ post in 2015. That's ~367,000 users posting daily if each posts only once (the actual post frequency will vary somewhat).
This doesn't include non-public posts or comments, or lurkers, but it's a pretty clear indication of the level of publicly visible activity on G+.
An article at Digital Inspiration tipped me off to the fact that Google publishes sitemaps to allow for spidering of Google+ by search engines:
The main sitemap file is here, it includes another 50,000 files, and has a last-modified date of 2015-01-17:
Rather than download 50,000 sitemaps and view the profiles in each, I made some simplifying assumptions. I base my analysis on an arbitrarily selected file, sitemap-25007-of-50000.gz (25,007 of 50,000, intentionally from near the middle of the pack). It contains 45,429 profile pages. Assuming reasonably uniform distribution, 45,429 * 50,000 gives 2.2 billion and change profiles. [Update: this is confirmed in Greg Miernicki and François Beaufort's analysis here: http://plus.miernicki.com/ Ed.]
I'm assuming a random distribution of profiles through sitemap pages (there are good reasons to assume this to be the case, though it might not be). So a single-file sampling is effectively a random selection. This should remove any sample bias.
From here, some straightforward shell commands (I'm on Linux, Mac OS users should have comparable tools, Windows -- install Cygwin to play along). I run the following as a bash one-liner (expanded for clarity):
i=0; time zcat sitemap-25007-of-50000.gz | while read URL; do i=$(( i + 1 )); echo -e "$i: \c"; lynx -dump $URL | grep "hasn't shared anything" || echo "Not found"; done | tee log
- Sets a counter (i)
- Times the full operation
- Reads from the sitemap file (sitemap-25007-of-50000.gz) and takes URLs for profile pages, feeding to a loop.
- Increments the counter
- Uses a console-mode browser to extract formatted text from the URL and looks for "hasn't shared anything" -- any profile with no public content has this string", or in the alternative "Not found", which refers to the string not the profile page.
- End of loop
- Dump output to a logfile.
I've intentionally kept the process simple and slow to avoid being throttled or tripping abuse / attack defenses. So far the script's worked fine (21,000+ profiles as I write this, the pull's still running).
The logfile generally looks like:
1: Jenilee hasn't shared anything with you.
2: Brian hasn't shared anything with you.
3: Gene hasn't shared anything with you.
4: kishor hasn't shared anything with you.
5: Daniel hasn't shared anything with you.
6: aping hasn't shared anything with you.
7: Corey hasn't shared anything with you.
8: Not found
9: Ohh hasn't shared anything with you.
10: kinyo2006 hasn't shared anything with you.
11: patrik hasn't shared anything with you.
12: Melina hasn't shared anything with you.
13: Not found
14: Akihito hasn't shared anything with you.
15: Paul hasn't shared anything with you.
16: Pamela hasn't shared anything with you.
17: Eddie hasn't shared anything with you.
18: bekzat hasn't shared anything with you.
19: H hasn't shared anything with you.
20: Calm hasn't shared anything with you.
The "Not found" lines refer to the string I'm searching for, not the profile page. Those are actually the active profiles.
As noted above, this is actually still running as I write, though again, based on random statistical sampling, I'm pretty confident of the numbers which emerge. I'm providing updated numbers below which are based on more data than the links above.
With 21,126 profiles read, 9.22% presently report something posted publicly. That percentage has ranged from a low of about 8% to a high of 11% as I've watched the script run, and mostly stayed between 8-10%.
Of profiles with any (public posting) activity, about 37% have as their most recent activity a comment to a YouTube video. Another 8% have a profile photo change. Both can be matched via pattern text search ("commented on a video on YouTube" and "changed .* profile photo".
With 7,875 profiles checked, I'd found 533 with some activity, of which 454 had a post date on the profile page. Breaking down most recent (public) posting activity by year for these profiles: Most recent public posting activity, by year and observed profiles:
Note that this is not a frequency distribution of posts by user by date, but shows when the profile was last active.
31% of all G+ profiles with any activity at all have not shown public activity since 2013.
And only 42/533 profiles (again, with any public activity) show any activity for 2015. That's 7.8% of "active" profiles, or 0.647% of all profiles. That's 13.2 million accounts.
But wait. Of those 42 posts, 18 "posts" are actually comments to YouTube videos, that's 41% or all public activity.
I've spot-checked, and to a high approximation, if the most recent post was a YouTube comment, all a profile's posts are. And this tells us why Google was so eager to combine YouTube and G+ comments -- it's doubled the apparent G+ activity.
So we've got a grand spanking total of 24 profiles out of 7875 who's 2015 post activity isn't YouTube comments but Google+ posts. That's an 0.3% rate of all profile pages, going back to our 2.2 billion profiles.
That gives us, 6.7 million active users. A really small count for the effort Google's put into this product.
No wonder Dave Besbris doesn't want to talk numbers.
I've also found that 11 profiles have redirects to personalized URLs. Spot-checking those ... the situation doesn't change much -- they fall into the patterns above or consist of old activity.
Caveats: I'm a space alien cat living outside the solar system. I don't have any conflicts of interest, other than a considerably growing distrust of Google. I use G+ actively myself, and find it useful, though limiting and exceptionally frustrating. And I dislike FaceBook far more.
The conclusions here are based on sampling and I've seen them shift by a few million as more data rolls in. Still, I'm pretty confident that the total number of publicly posting (not commenting, which could well be far higher) users is well below 10 million for January, 2015.
Update: (22-Jan-2015) I've clarified in a few spots that "activity" corresponds to "public posting".
I've also completed the first pull of 45K+ sampled records. Cumulative results are consistent with the first 21k results noted above. I'm verifying that the sampling appears to be consistent and spot-checking with selections from other sitemap files.
The 4,215 "active" profile IDs identified via the analysis above are in the following pastebin:: http://pastebin.com/tmdcsKLZ
While posting a full set of the "inactive" accounts is a bit much, I've used a randomly selected sample of 30 for further analysis as well, here: