A few weeks ago the NY Times firehose feed broke.
I emailed with a friend at the Times, and we were able to get it working again. But the new version of the firehose is a mere trickle compared to the former raging torrent.
This put me in a bad place because I depend on a gush of NYT headlines in my river. I could subscribe to all the feeds I could find, but that means that I’d get duplicate stories because the Times, like other pubs, runs many stories in multiple feeds.
I’ve always been thinking about doing a heuristic to fix this. I’d keep track of the titles that had already appeared in a river and skip duplicates. Last night during the Giants game I gave it a shot, and it worked.
I wrote the change up in this worknote.
I added a huge number of feeds to the NYT river. And it’s starting to feel good again. I wanted to share this as a possible best-practice for other aggregator developers.
- After running for a few hours — success. The NYT river is back to its rich flow, at a time when there’s lots going on — the presidential election and a hurricane. And there aren’t any duplicates. All is good. 🙂
- It’s been a while since I really looked at the NYT river. They write such good descriptions. You have a pretty good idea what the article is about even without clicking. Much more useful than getting full text. Because I get a breadth of the news, and the experience is created by editors who know what they’re doing.