Sandeep Shetty's Blog

Thursday, June 06, 2013

Does polling scale better than push?

For the sake of simplicity, given 10000 subscribers, 1 publisher and assuming resource required for serving 1 pull request is roughly equal to resource required for sending 1 push:

A hub that is pulled from every minute has to serve number of subscribers x 1440 requests per day, i.e., 10000 subscribers x 1440 requests per day irrespective of the number of updates.
A hub that pushes has to send (number of subscribers x number of updates) per publisher pushes per day i.e., 10000 subscribers x number of updates x 1 publisher pushes per day, i.e., 10000 subscribers x number of updates pushes per day.
So that's 10000 x 1440 for pull and 10000 x number of updates for push.
Therefore, if number of updates per day is greater than 1440, a hub that pushes will require more resources than ones that is pulled from.
More importantly, a hub that is pulled from will not require additional resources if the number of updates per day increases.

Would love to hear what you think (in the comments) especially if you think this might not be the case.

Notes

This assumes that >= 1 min latency is ok for your specific use-case.
Resource required for serving 1 pull request might not be equal to resource required for sending 1 push. Here are my notes for why, I would love to hear yours:

Given constant number of subscribers and publishers, a pull based system will experience a uniform load throughout while a push based system will experience load in bursts.
Push potentially uses less bandwidth though Pull can take advantage of caching.
Push has the overhead of subscribers not being available, keeping track of such subscribers and retrying several times.

Proof by induction doesn't work because with push not every subscriber is subscribed to every publisher.

See PushHubPullSub

This was inspired by my notes on Push vs Pull on the IndieWebCamp wiki.

PushHubPullSub

publishers Push updates to a Hub and updates are Pulled by Subscribers from the hub.

PuSH (PubSubHubbub) is a good way to solve the publishers and hubs problems (offloading work and polling lots of sites respectively). The idea with PushHubPullSub is to simplify subscribers by having them poll the hub.

See Does polling scale better than push?

Update: Moved the Does polling scale better than push section to a blog post of it's own.

This was inspired by my notes on Push vs Pull on the IndieWebCamp wiki.

Friday, May 31, 2013

RecentChanges, a simple alternative to ActivityStreams

For updates watch: https://github.com/converspace/recentchanges

Some thoughts on representing updates to a site inspired by wiki style RecentChanges:

Every resource (URL) has a RecentChanges endpoint.

The RecentChanges endpoint at each level of the (URL) hierarchy aggregates all RecentChanges under it.

The RecentChanges endpoint of the site aggregates site-wide RecentChanges.

RecentChanges only requires/uses 4 verbs: Post, Respond, Update, Delete. (open to renaming these but the idea is that 4 verbs are enough)

Examples:

Sandeep Shetty posted Foobar. (new post)

Sandeep Shetty updated Foobar. (edited an existing post)

AnonymousOnPurpose responded to Foobar. (commented on a post - could even be a response to a specific comment)

Sandeep Shetty deleted Foobar. (deleted a post)

Monday, May 06, 2013

Thinking About Metadata

Some of my thoughts on tagging and metadata in Converspace:

Syntax over Interface: I prefer (from a user experience perspective) how tagging (and other meta-data like mentions, etc.) evolved on Twitter to be just syntax and became part of the content (without being obtrusive) with no special interface elements dedicated to them. This allows for the same interface to serve people that don't need them, and the ones that do. Invisible to the users that don't need it but yet, always there for people that need it.
Visible Metadata: My preference for tags being part of the content has the advantage of them being always visible (moves/hangs with the content). However, it also has the disadvantage of not being able to cleanly do things like private tags (like how Pinboard does with tags that start with a period. e.g., .secret_tag). One obvious advantage of private tags is that you can do stuff like what Selective Tweets does with the #fb tag, but without having a visible public tag: like this IFTTT receipe that crossposts Pinboard bookmarks to Twitter that have the .twitter private tag. For the specific use case of publishing workflows, I'm considering using (something I'm calling) local action tags (tags that start with &, e.g., &action_tag) that are ephemeral and consumed by the publishing workflow and not saved as part of the content. Action tags obviously cannot be interspersed with the content and will have to be added at the end. Still need to figure out how this will work when the publishing workflow is also adding machine tags at the end.

Update (after sleeping over it): Won't be implementing actions tags (as described above) because of it's limited scope (especially when it comes to allowing third-parties to participate in the publishing workflow) ~~and I'm on the fence about Machine/Triple tags~~.

Update (May 07, 2013):

Auto-tagging: Allow for the publishing workflow to automatically add tags (including Machine/Triple tags). This is hard when you don't have a separate tags property and only have one blob of text (content). For example, it might not make sense to add tags at the end of single-line post when it is missing an ending punctuation mark. To allow for auto-tagging, I came up with a syntax for trailing tags. Trailing tags are preceded by a blank line, starts with two spaces, followed by space-separated tags, followed by the end-of-string. e.g., "\n\n #additional_tag1 #additional_tag2". Trailing tags can be added at the end of content if they do not exist or tags can be appended to existing ones. I chose this syntax for the following reasons:

When viewing the Markdown, trailing tags appear slightly indented, which visually separates them from the rest of the content.
AFAIK, it doesn't conflict with existing Markdown syntax. This makes it invisible when rendered by processors that don't support it.

Update (June 05, 2013):

Machine tags are invisible metadata: Machines tags provide context for "machines" and should be syndicated but not displayed.

See also:

Monday, April 29, 2013

Webmention action

Exicted to see all the action around webmention, a modern alternative to the pingback protocol I drafted.

http://tantek.com/2013/113/b1/first-federated-indieweb-comment-thread
http://aaronparecki.com/articles/2013/03/31/1/a-response-to-replies-i-received-on-my-post-an-open-challenge-to-app-net
Client library for sending webmention and pingback notifications: https://github.com/aaronpk/mention-client
pingback.me, a service to convert Pingbacks to WebMentions
http://indiewebcamp.com/comment
http://indiewebcamp.com/webmention
http://lists.w3.org/Archives/Public/public-rww/2013Apr/0078.html

Sunday, April 07, 2013

Grains vs Milk

"It's also important to consider the big picture when judging the suitability of various foods. It helps to tell stories about the food we eat, to think about narratives. Grains aren't just little morsels of protein, carbs, and fiber bred for our enjoyment. They are baby plant eggs. Those macronutrients are there to sustain the seed's growth and those micronutrients are there to protect it. They are the plant's lifeline to immortality. They are literally shaped by the hand of evolution to survive and ravage the digestive tract of the poor sap that swallows them and discourage further consumption. Grain is only food because we deemed it so. Dairy? Dairy is objectively, absolutely food. Its fat, protein, and carbs are there to be consumed, albeit by young cows, sheep, and goats. It's meant to spur growth, to pack on muscle and fat and weight. And yeah, eating dairy protein causes an insulin spike, but that can be useful if you know what you're doing." -- http://www.marksdailyapple.com/dairy-insulin/