Here's some working code to extract machine tags (aka triple tags) from a string. Possibly one of the ugliest regular expressions I've ever written.
References
- Spec:
- http://www.flickr.com/groups/api/discuss/72157594497877875
- http://www.aaronland.info/talks/mw10_machinetags/#62
- http://tagaholic.me/2009/03/26/what-are-machine-tags.html
- Regexes I found elsewhere:
- https://github.com/ibolmo/habwatch/blob/bcba1a11b7073c2c864296ab89787af7b7fc0e00/lib/util/flickr/Flickr/Photo.class.php
- https://github.com/cldwalker/has_machine_tags/blob/48ed628c04def3539387c6cdfd7145a004a4b8fc/lib/has_machine_tags/tag_methods.rb
- Regex syntax I used:
- http://php.net/manual/en/regexp.reference.subpatterns.php
- http://php.net/manual/en/regexp.reference.back-references.php
- http://php.net/manual/en/regexp.reference.assertions.php
- http://php.net/manual/en/regexp.reference.conditional.php
- unrolling-the-loop technique for quoted string with escaping:
- http://ad.hominem.org/log/2005/05/quoted_strings.php
- http://stackoverflow.com/questions/5695240/php-regex-to-ignore-escaped-quotes-within-quotes/5696141#5696141
- http://stackoverflow.com/questions/249791/regex-for-quoted-string-with-escaping-quotes/249832#249832
- Negative lookahead as an alternative to using backreferences within character classes.
No comments:
Post a Comment