Sunday, June 09, 2013

Really Simple Social Blogging

A proposal to implement a decentralized Tumblr/Facebook/Twitter like social blogging platform using simple things like WebMention and Microformats. This is based on some experiments I'm doing with Converspace on sandeep.io and was inspired by The First Federated #Indieweb Comment Thread.

Based on usage, it looks like I primarily do 4 things on sandeep.io:
  1. Post original content. This could be text (both short and long form), links, photos, videos, quotes, etc. (http://www.sandeep.io/19)
  2. Repost content from others I find interesting. (http://www.sandeep.io/36)
  3. Comment on content from others. (http://www.sandeep.io/32)
  4. Like content from others. (http://www.sandeep.io/33)

Turns out this is also broadly what you do on Tumblr, Twitter and Facebook:
  • Tumblr: blog, reblog, comment and like.
  • Twitter: tweet, retweet, reply and favorite.
  • Facebook: update status, share, comment and like.

So I set out to see how this could be done in a decentralized way across the #indieweb. A couple of experiments later, I think I have a simple solution for achieving this, using nothing more than simple things like WebMention and Microformats.

The "social" part of this is letting others know that you have done one of those 4 things listed above and especially the person whose content you've reposted, liked or commented on.

This is where WebMention comes. It's a simple way to let any URL on the web know that  you've linked to it on your site. The problem however is communicating the context in which the URL was mentioned:
  • Was it just mentioned in passing along with other content?
  • Was it's content reposted?
  • Was it liked?
  • Was it linked to by someone commenting on it?

Taking a cue from the the experimental u-in-reply-to microformat, I'm using the following experimental classnames for links within h-entry:

A target URL that receives a WebMention can retrieve the source URLs HTML content and look for the above Microformat classnames to figure out the context in which it was mentioned along with an h-card/p-author entry to figure out the person involved.

The target can then show:
  • Total number of likes along with the details of the people that liked it.
  • Total number of reposts along with the details of the people that reposted it.
  • Total number of mentions along with the URLs of the sites that mentioned it.
  • Comments along with the details of the people that commented on it.
See this in action here: Indieweb Federated "Likes".

An important part that is missing from the above is letting other people easily follow you and get updates when you post something on your site. A microformats based feed reader should solve that. Following someone also gives you the opportunity to send a WebMention to the profile URL of the person you followed which in turn allows that person to show a Follower count (using u-follow maybe) along with the details of the followers. I've yet to explore this but will be posting more details when I get to it and dogfood it.


Wishlist: A microformats search engine that crawls the web looking for microformats, especially h-card so I can search for people just like I can on silo social networks.

Here are some additional experimental classnames I'm considering but not yet using:
  • u-quote to be used when you quote text from a URL verbatim.
  • u-follow to be used when you follow/subscribe to a URL (usually a person)


Todo

  • A way to undo WebMentions (e.g., unlike) by deleting the source URL and sending a WebMention for which the target would receive a 404 in turn deleting the original WebMention. 
  • I'm also hoping to extend WebMention to allow for private access to URLs to only the people that were sent a WebMention.

Updates

08 June, 2013
  • Added h-card search wishlist.
09 June, 2013
  • Added attribution to the @eschnou's indieweb comment thread that was the first instance I know of that combined something like WebMention (Pingback) and Microformats to figure out context. It went beyond the simple rel="in-reply-to" suggestion made in WebMention and read h-cards.
  • Added note about sending WebMentions to user profile URLs. (rememberd to add this thanks to this tweet by @benwerd)
  • Added note about private access. (rememberd to add this thanks to this tweet by @benwerd)
  • Added list of other experimental classnames I'm considering.
10 June 2013
  • Created the Todo section and added note about undoing WebMentions.

See Also

Friday, June 07, 2013

Extracting machine tags (aka triple tags) from a string

Here's some working code to extract machine tags (aka triple tags) from a string. Possibly one of the ugliest regular expressions I've ever written.

References

Thursday, June 06, 2013

Does polling scale better than push?

For the sake of simplicity, given 10000 subscribers, 1 publisher and assuming resource required for serving 1 pull request is roughly equal to resource required for sending 1 push:
  • A hub that is pulled from every minute has to serve number of subscribers x 1440 requests per day, i.e., 10000 subscribers x 1440 requests per day irrespective of the number of updates.
  • A hub that pushes has to send (number of subscribers x number of updates) per publisher pushes per day   i.e., 10000 subscribers x number of updates x 1 publisher pushes per day, i.e., 10000 subscribers x number of updates pushes per day.
  • So that's 10000 x 1440 for pull and 10000 x number of updates for push.
  • Therefore, if number of updates per day is greater than 1440, a hub that pushes will require more resources than ones that is pulled from. 
  • More importantly, a hub that is pulled from will not require additional resources if the number of updates per day increases.

Would love to hear what you think (in the comments) especially if you think this might not be the case.

Notes

  • This assumes that >= 1 min latency is ok for your specific use-case.
  • Resource required for serving 1 pull request might not be equal to resource required for sending 1 push. Here are my notes for why, I would love to hear yours:
    • Given constant number of subscribers and publishers, a pull based system will experience a uniform load throughout while a push based system will experience load in bursts.
    • Push potentially uses less bandwidth though Pull can take advantage of caching.
    • Push has the overhead of subscribers not being available, keeping track of such subscribers and retrying several times. 
  • Proof by induction doesn't work because with push not every subscriber is subscribed to every publisher.


See PushHubPullSub

This was inspired by my notes on Push vs Pull on the IndieWebCamp wiki.

PushHubPullSub

publishers Push updates to a Hub and updates are Pulled by Subscribers from the hub.

PuSH (PubSubHubbub) is a good way to solve the publishers and hubs problems (offloading work and polling lots of sites respectively). The idea with PushHubPullSub is to simplify subscribers by having them poll the hub.

See Does polling scale better than push?

Update: Moved the Does polling scale better than push section to a blog post of it's own.

This was inspired by my notes on Push vs Pull on the IndieWebCamp wiki.