Showing posts with label Reader. Show all posts

10th Anniversary of Google Reader Shutdown #

It doesn't feel like it's been 5 years since my last post about Reader, but I guess the past few years have suffered from time compression. For this anniversary I don't have any cool projects to unveil, but that's OK, because David Pierce wrote a great article – Who killed Google Reader? – that serves as a nice encapsulation of the entire saga.

Other Reader team members and I had a chance to talk to David, and the article captures all of the major moments. Some things ended up being dropped though; there's enough twists and turns (in the social strategy alone) that a a whole book could be written. Here's some more "fun" tidbits from Reader's history:

The article talks about "Fusion" being Reader's original/internal name (and how the "Reader" changed how it was perceived and limited its scope). The reason why "Fusion" was not used was because Google "wanted the name [Fusion] for another product and demanded the team pick another one. That product never launched, and nobody I spoke to could even remember what it was.". Fusion was the initial name that iGoogle launched under, as can be seen in this article from mid-2005 (iGoogle itself was went through some naming changes, changing from Fusion to Google Personalized Homepage before ending up as iGoogle (its codename) in 2007). Finding the breadcrumbs of this story was somewhat difficult because Google later launched a product called Google Fusion Tables (not surprisingly, it was also shut down).

In terms of naming, these were other names that were considered, so "Reader" was as worst-except-for-all-the-rest sort of thing:

  • Google Scoop (which team took to referring to as "Scooper", as in pooper scooper)
  • Google Viewer
  • Google Finder
  • Google Post

At one point during the (re-)naming saga Chris put in "Transmogrifier" as a placeholder name, with the logo being one of Calvin's cardboard boxes. During the next UI review Marissa Mayer was not amused (or perhaps it was hard to tell what the logo was in those pre-retina days), and the feedback that we got was "logo: no trash".

A low point in the interal dynamics was hit in 2011. I had made some small tweaks (in my 20% time) to fix an annoying usability regression where links were black (and thus not obviously clickable). Since we were getting a lot of flack for it on Twitter, I tweeted from the Reader account saying that it was fixed. A few hours later, I got a friendly-but-not-really ping from a marketing person saying that I need to run all future tweets by them, since there was an express request from Vic Gundotra to limit all communication about Reader, lest users think that it's still being actively worked on. That was the second-to-last tweet from the official account, the next one was the shutdown announcement.

After Twitter blew up at SXSW 2007 there was a start of a "I don't need Reader/RSS, Twitter does it for me" vibe amongst some of the "influencers" of the time. I posted a somewhat oblique tweet comparing the Google Trends rankings of "google reader" and "twitter" (with "toenails" being a neutral term to set a baseline), showing that Reader dwarfed them all (the graph looks very different nowadays). I couldn't understand why someone would want to replace Reader with a product that had no read state, limited posts to 140 characters, and didn't even linkify URLs, let alone unfurl them. In retrospect this was a case of low-end disruption.

Google Reader: A Time Capsule from 5 Years Ago #

Google ReaderIt's now been 5 years since Google Reader was shut down. As a time capsule of that bygone era, I've resurrected readerisdead.com to host a snapshot of what Reader was like in its final moments — visit http://readerisdead.com/reader/ to see a mostly-working Reader user interface.

Before you get too excited, realize that it is populated with canned data only, and that there is no persistence. On the other hand, the fact that it is an entirely static site means that it is much more likely to keep working indefinitely. I was inspired by the work that Internet Archive has done with getting old software running in a browser — Prince of Persia (which I spent hundreds of hours trying to beat) is only a click away. It seemed unfortunate that something of much more recent vintage was not accessible at all.

Right before the shutdown I had saved a copy of Reader's (public) static assets (compiled JavaScript, CSS, images, etc.) and used it to build a tool for viewing archived data. However, that required a separate server component and was showing private data. It occurred to me that I could instead achieve much of the same effect directly in the browser: the JavaScript was fetching all data via XMLHttpRequest, so it should just be a matter of intercepting all those requests. I initially considered doing this via Service Worker, but I realized that even a simple monkeypatch of the built-in object would work, since I didn't need anything to work offline.

The resulting code is in the static_reader directory of the readerisdead project. It definitely felt strange mixing this modern JavaScript code (written in TypeScript, with a bit of async/await) with Reader's 2011-vintage script. However, it all worked out, without too many surprises. Coming back to the Reader core structures (tags, streams, preferences, etc.) felt very familiar, but there were also some embarrassing moments (why did we serve timestamps as seconds, milliseconds, and microseconds, all within the same structure?).

As for myself, I still use NewsBlur every day, and have even contributed a few patches to it. The main thing that's changed is that I first read Twitter content in it (using pretty much the same setup I described a while back), with a few other sites that I've trained as being important also getting read consistently. Everything else I read much more opportunistically, as opposed to my completionist tendencies of years past. This may just be a reflection of the decreased amount of time that I have for reading content online in general.

NewsBlur has a paid tier, which makes me reasonably confident that it'll be around for years to come. It went from 587 paid users right before the Reader shutdown announcement to 8,424 shortly after to 5,345 now. While not the kind of up-and-to-right curve that would make a VC happy, it should hopefully be a sustainable level for the one person (hi Samuel!) to keep working on it, Pinboard-style.

Looking at the other feed readers that sprung up (or got a big boost in usage) in the wake of Reader's shutdown, they all still seem to be around: Feedly, The Old Reader, FeedWrangler, Feedbin, Innoreader, Reeder, and so on. One of the more notable exceptions is Digg Reader, which itself was shut down earlier this year. But there are also new projects springing up like Evergreen and Elytra and so I'm cautiously optimistic about the feed reading space.

Google Reader's Launch Was 10 Years Ago #

Google Reader was launched 10 years ago ago today. Though it did not live to see its 10th birthday, pieces of it still live on. Specifically FRBE (Feed Reader Backend) was Reader's backend, but over time it evolved into a reusable piece of Google infrastructure, powering the Feed API, Blogger's following feature and many other things. Thanks to an anonymous source within Google, it's possible to see that this bit of Reader is still up and serving requests.

frbe tasks in borg cells

Using Google Reader's reanimated corpse to browse archived data #

Having gotten all my data out of Google Reader, the next step was to do something with it. I wrote a simple tool to dump data given an item ID, which let me do spot checks that the archived data was complete. A more complete browsing UI was needed, but this proved to be slow going. It's not a hard task per se, but the idea of re-implementing something that I worked on for 5 years didn't seem that appealing.

It then occurred to me that Reader is a canonical single page application: once the initial HTML, JavaScript, CSS, etc. payload is delivered, all other data is loaded via relatively straightforward HTTP calls that return JSON (this made adding basic offline support relatively easy back in 2007). Therefore if I served the archived data in the same JSON format, then I should be able to browse it using Reader's own JavaScript and CSS. Thankfully this all occurred to me the day before the Reader shutdown, thus I had a chance to save a copy of Reader's JavaScript, CSS, images, and basic HTML scaffolding.

zombie_reader is the implementation of that idea. It's available as another tool in my readerisdead.com collection. Once pointed at a directory with an archive generated by reader_archive, it parses it and starts an HTTP server on port 8074. Beyond serving the static resources that were saved from Reader, the server uses web.py to implement a minimal (read-only) subset of Reader's API.

The tool required no modifications to Reader's JavaScript or CSS beyond fixing a few absolute paths1. Even the alternate header layout (without the Google+ notification bar) is something that was natively supported by Reader (for the cases where the shared notification code couldn't be loaded). It also only uses publicly-served (compressed/obfuscated) resources that had been sent to millions of users for the past 8 years. As the kids say these days, no copyright intended.

A side effect is that I now have a self-contained Reader installation that I'll be able to refer to years from now, when my son asks me how I spent my mid-20s. It also satisfies my own nostalgia kicks, like knowing what my first read item was. In theory I could also use this approach to build a proxy that exposes Reader's API backed by (say) NewsBlur's, and thus keep using the Reader UI to read current feeds. Beyond the technical issues (e.g. impedance mismatches, since NewsBlur doesn't store read or starred state as tags, or has per item tags in general) that seems like an overly backwards-facing option. NewsBlur has its own distinguishing features (e.g. training and "focus" mode)2, and forcing it into a semi-functional Reader UI would result in something that is worse than either product.

  1. And changing the logo to make it more obvious that this isn't just a stale tab from last week. The font is called Demon Sker.
  2. One of the reasons why I picked NewsBlur is that it has been around long enough to develop its own personality and divergent feature set. I'll be the first to admit that Reader had its faults, and it's nice to see a product that tries to remedy them.

Getting ALL your data out of Google Reader #

Update on July 3: The reader_archive and feed_archive scripts are no longer operational, since Reader (and its API) has been shut down. Thanks to everyone that tried the script and gave feedback. For more discussion, see also Hacker News.

There remain only a few days until Google Reader shuts down. Besides the emotions1 and the practicalities of finding a replacement2, I've also been pondering the data loss aspects. As a bit of a digital pack rat, the idea of not being able to get at a large chunk of the information that I've consumed over the past seven and a half years seems very scary. Technically most of it is public data, and just a web search away. However, the items that I've read, tagged, starred, etc. represent a curated subset of that, and I don't see an easy of recovering those bits.

Reader has Takeout support, but it's incomplete. I've therefore built the reader_archive tool that dumps everything related to your account in Reader via the "API". This means every read item3, every tagged item, every comment, every like, every bundle, etc. There's also a companion site at readerisdead.com that explains how to use the tool, provides pointers to the archive format and collects related tools4.

Additionally, Reader is for better or worse the papersite of record for public feed content on the internet. Beyond my 545 subscriptions, there are millions of feeds whose histories are best preserved in Reader. Thankfully, ArchiveTeam has stepped up. I've also provided a feed_archive tool that lets you dump Reader's full history for feeds for your own use.5

I don't fault Google for providing only partial data via Takeout. Exporting all 612,599 read items in my account (and a few hundred thousand more from subscriptions, recommendations, etc.) results in almost 4 GB of data. Even if I'm in the 99th percentile for Reader users (I've got the badge to prove it), providing hundreds of megabytes of data per user would not be feasible. I'm actually happy that Takeout support happened at all, since my understanding is that it was all during 20% time. It's certainly better than other outcomes.

Of course, I've had 3 months to work on this tool, but per Parkinson's law, it's been a bit of a scramble over the past few days to get it all together. I'm now reasonably confident that the tool is getting everything it can. The biggest missing piece is a way to browse the extracted data. I've started on reader_browser, which exposes a web UI for an archive directory. I'm also hoping to write some more selective exporters (e.g. from tagged items to Evernote for Ann's tagged recipes). Help is appreciated.

  1. I am of course saddened to see something that I spent 5 years working on get shut down. And yet, I'm excited to see renewed interest and activity in a field that had been thought fallow. Hopefully not having a a disinterested incumbent will be for the best.
  2. Still a toss-up between NewsBlur and Digg Reader.
  3. Up to a limit of 300,000, imposed by Reader's backend.
  4. If these command-line tools are too unfriendly, CloudPull is a nice-looking app that backs up subscriptions, tags and starred items.
  5. Google's Feed API will continue to exist, and it's served by the same backend that served Google Reader. However it does not expose items beyond recent ones in the feed.

Google Reader Shutdown Tidbits #

Based a lunch with Alan Green at Google on June 21, 2013. Posted on April 20, 2024, but backdated to the time that this was written in a private document.

The shutdown timing was mainly technical. There have been enough infrastructure changes that the Reader codebase has rotted, and it cannot be pushed to prod anymore. It sounded like there hadn't been any pushes for ~6 months. I'm pretty I pushed shortly before I left (October 2012), so it's a bit surprising that the code rotted that quickly.

The shutdown is mainly being handled by the SREs (Alan will actually be on vacation for the two weeks before July 1). It effectively sounded like they were going to be removing the GFE rules on July 1, and then take their time actually turning off any servers, since that actually involves understanding how things work and what depends on what. The FRBEs will definitely be running for a while longer, since there are other Google services that depend on them.

All of the feed data is going to be given to the Feeds team in Zurich (they also inherited the PubSubHubbub hub and maybe even the AJAX Feed API). They will hopefully archive it. Matt Cutts has been part of the cabal that has been trying to make the shutdown be handled as reasonably as possible.

He said politics didn't really factor into it. If it had been politics, the easiest thing would have been to do nothing, and let the service run as is idefinitely. Wipeout (Reader is not compliant, data for deleted Gaia accounts is still present) was a slight factor, but if that had been the only reason, it still would have been easier to let it keep running.

Once the shutdown decision was made, they needed to put someone's name on the blog posts (Google Blog, Reader Blog). Alan said he was OK with his name being on the Reader Blog, since he was the last engineer standing. However, it didn't make sense for this name (as a random engineer) to be on a Google Blog post that announced the shutting down of several services. It made more sense for a VP, or at least a director. PR asked several directors and VPs (including Alan Noble, the SYD site director), and they all begged off, saying that they had (external) people they were going to be meeting in the next couple of weeks, and if their name was on the blog post, they would just get a lot of hate about shutting down Reader. PR then asked Alan, and after thinking about it, he declined. PR thanked him for seriously considering it, and then went to Alan's manager and asked him to ask Alan. Alan declined again. PR asked his manager to ask him again, and Alan said he would only do it if they promoted him to director, and that was the last that he heard of it. Eventually Urs said he would be OK with putting his name on it. Alan seemed to have a pretty good opinion of Urs; that of all the VPs he was most willing to speak truthfully about Reader.

PR also gave Alan a document for posting to reader-discuss@ and internal Google+. It was apparently terrible, but he was at least allowed to rewrite it. In general it sounded like PR is now very involved in internal communications; Alan sounded rather cynical about that.

The blog post announcing the shutdown was done one day early. The idea was to take the opportunity of the new Pope being announced and Andy Rubin being replaced as head of Android, so that the Reader news may be drowned out. PR didn't apparently realize that the kinds of people that care about the other two events (especially the Pope) are not the same kind of people that care about Reader, so it didn't work.

This also screwed up the internal announcement plans. The idea had been to announce the management reshuffle on Tuesday, have a town hall about it Wednesday morning, and then announce the Reader shutdown on Wednesday afternoon, leaving TGIF (now on Thursdays) as the venue to discuss it. Since it all happened on Tuesday, the townhall ended up being dominated with Reader questions. They continued at TGIF, to the point where Sergey held up a microphone cable and said “If I bite down on this, will the pain stop?” Urs was the only VP who had decent answers to the Reader questions (Matt Cutts in particular spoke for a while defending Reader). About a month (?) later, there was a “bring your parent(s) to work day”, at which they held a special TGIF in Shoreline Amphitheatre. Parents were apparently encouraged to ask questions, and the first parent asked about the Reader shutdown, which elicited a lot of laughter from all the Googlers.

The People Behind Google Reader #

If Google Reader were a movie or TV show, at the end of the spectacle the credits would roll and you would get to see who was responsible for what you just saw. But in today's age, software "about box credits" are no longer common.

I thought it might be nice, for the sake of posterity, to list all those who worked on Reader over the years. There's been a lot of discussion about Reader's imminent shutdown, but most of it focused on Google (the corporate entity) and its strategy. However, at the end of the day, Reader was built by people. I and a few others have been lucky enough to be more visible, but everyone involved deserves credit and thanks. This is especially the case since as Chris and Brian have described, Reader faced quite a few internal struggles. As I remember it, nearly everyone on the Reader team explicitly requested to join it, and often had to fight to keep their role.

Coming up with this list was difficult, both technically (can you name all your coworkers going back 8 years?), and because it was tough to decide where to draw the line. Google is a big company, and many people in many supporting roles helped to Reader out. First, here's a list of all full-time Reader team members:

Additionally, here's others who contributed to Reader in various roles at Google:

  • Design: Micheal Lopez
  • Executives: Greg Badros, Jeff Huber, Pavni Diwanji
  • Legal: Halimah DeLaine
  • Localization: Gabriella Laszlo, John Saito, Katsuhiko Momoi, Sasan Banava
  • PR: Nate Tyler, Oscar Shine, Sonya Boralv
  • Product Management: Bruce Polderman, Sabrina Ellis
  • Product Marketing: Kevin Systrom, Louis Gray, Peter Harbison, Robby Stein, Tom Stocky, Zach Yeskel
  • Quality Assurance: Amar Amte, Jan Carpenter, Kavitha Venkatesan, Madhuri Kulkarni, Thanh Le
  • Site Reliability Engineering: Chen Wang, Christoph Pfisterer, David Parrish, Ed Bardsley, Eric Weigle, Gary Luo, Huaxia Xia, James Long, Jerry Zhiwei Cen, Keith Brady, Lantian Zheng, Liren Chen, Matthew Eastman, Nadav Samet, Niall Sheridan, Olivier Beyssac, Patrick Scott, Paul Chien, Pereira Braga, Petru Paler, Sara Smollett, Scott Lamb, Sebastian Adamczyk, Vladimir Filipović, Wensheng Wang, Yu Liao
  • 20% time and additional engineering: Aaron Boodman, Abdulla Kamar, Akshay Patil, Aman Bhargava, Brad Fitzpatrick, Brett Bavar, Brett Slatkin, Charles Chen, Ed Ho, John Pongsajapan, Olga Stroilova, Peter Baldwin, Steve Jenson, Steve Lacey, T.V. Raman, Wiktor Gworek
  • User Experience Designers: Jonathan Terleski, Sean McBride
  • User Experience Research: Anna Avrekh, David Choi, Nika Smith, Theresa Sobczak
  • User Support: Graham Waldon, Paul Wilcox, Wen-Ai Yu

I'm sure I'm missing names and got things wrong, so don't hesitate to contact me with corrections. And to everyone that I worked with on Reader, it was a pleasure!

P.S. For another take on the people behind Reader, see Chris's #unsungHeroesOfGoogleReader tweets.

Being a new parent as told through Reader trends #

Paternity leave has meant lots of 10-20 minute gaps in the day that can filled with Reader catchup:

Google Reader Trends 30 day chart

Even when trying to put the baby to sleep at 1am or 5am, thanks to 1-handed keyboard shortcuts:

Google Reader Trends hour of day chart

Google Reader Social Retrospective #

With the upcoming transition of social features in Google Reader to Google+, I thought this would be a good time to look back at the notable social-related events in Reader's history. For those of you who are new here, I was Reader's tech lead from 2006 to 2010.

Late 2004 to early 2005: Chris Wetherell starts work on "Fusion", one of the 20% projects that serve as prototypes for Google Reader. Among other neat features, it has a "People" tab that shows you what other people on the system are subscribed to and reading. There's no concept of a managed friends list, after all when the users are just a few dozen co-workers, we're all friends, right?

September 2005: Ben Darnell and Laurence Gonsalves add the concept of "public tags" to the nascent Reader backend and frontend. There are no complex ACLs, just a single boolean that controls whether a tag is world-readable.

October 2005: A remnant of the "People" tab is present in the HTML of the launched version of Google Reader, and an eagle-eyed Google Blogoscoped forum member notices it and speculates as to its intended use.

March 2006: Tag sharing launches, along with the ability to embed a shared tag as a widget in the sidebar of your blog or other sites. On one hand, tag sharing is quite flexible: you can share both individual items by applying a tag to them, and whole feeds (creating spliced streams) if you share folders. On the other hand, having to create a tag, share it and manually apply it each time is rather tedious. A lot of users end up sharing their starred items instead, since that enables one-click sharing.

Summer of 2006: As part of Brad Hawkes's summer internship, he looks into what can be done to make shared tags more discoverable (right now users have to email each other URLs with 20-digit long URLs). He whips up a prototype that iterates over a user's Gmail contacts and lists shared tags that each contact might have. This is neat, but is shelved for both performance (there's a lot of contacts to scan) and privacy (who exactly is in a user's address book?) concerns.

Reader &auot;share" actionSeptember 2006: Along with a revamped user interface, Reader re-launches with one-click sharing, allowing users to stop overloading starred items.

May 2007: Brad graduates and comes back work on Reader full-time. His starter project is to beef up Reader's support for that old school social network, email.

Fall of 2007: There is growing momentum within Google to have a global (cross-product) friend list, and it looks like the Google Talk buddy list will serve as the seed. Chris and I start to experiment with showing shared items from Talk contacts. We want to use this feature with our personal accounts (i.e. real friends), but at the same time we don't want to leak its existence. I decide to (temporarily) call the combined stream of friends' shared items "amigos". Thankfully, we remember to undo this before launch.

Friends' shared items treeDecember 2007: After user testing, revamps, and endless discussions about opt-in/out, shared items from Google Talk buddies launches. Sharing is up by 25% overnight, validating that sharing to an audience is better than doing it into the void. On the other hand, the limitations of Google Talk buddies (symmetric relationships only, contact management has to happen within Gmail or Talk, not Reader) and communication issues around who could see your shared items lead to some user stress too.

Spring of 2008: With sharing in Reader picking up steam, a few aggregators and leaderboards of shared items start to spring up. Louis Gray comes to the attention of the Reader team (and its users) by discovering the existence of ReadBurner before its creator is ready to announce it.

May 2008: Up until this point sharing has been without commentary; it was up to the reader of the shared item to decide if it had been shared earnestly, ironically, or to disagree with it. "Share with note" gives users an opportunity to attach a (hopefully pithy) commentary to their share. Also in this launch is the "Note in Reader" bookmarklet (internally called "Tag Anything") that allows users to share arbitrary pages through Reader.

August 2008: Incorporating the lessons learned from Reader's initial friends feature, the preferred Google social model is revamped. Instead of a symmetric friend list based on Google Talk buddies, there is a separate, asymmetric list that can be managed directly within Reader. The asymmetry is "push"-style: users decide to share items with some of their contacts, but it's up those contacts to actually subscribe if they wish (think "Incoming" stream on Google+, where people are added to a "See my Reader shared items" circle). This feature is brought to life by Dolapo Falola, who injects some much-needed humor into the Reader code: the unit tests use the Menudo band members to model relationships and friends acquire a (hidden) "ex-girlfriend" bit.

New comments indicatorMarch 2009: After repeated user requests, (and enabled by more powerful ACL supported added by Susan Shepard) comments on shared items are launched. Once again Dolapo is on point for the frontend side, while Derek Snyder does all the backend work and makes sure that Reader won't melt down when checking whether to display that "you have new comments" icon. The ability of the backend and user interface to handle multiple conversations about an item is stress-tested with a particularly popular Battlestar Galactica item.

May 2009: Bundles are launched, extended sharing from just individual tags to collections of feeds.

Hearts when like-ing an itemJuly 2009: Continuing the social learning process, the team (and Google) revamps the friends model once again, switching to a asymmetric "pull"-style (i.e. following) model. This is meant to be "pre-consistent" with the upcoming Google Buzz launch. Also included in this launch are better ties to Google Profiles and the ability to "like" items. In general there are so many moving parts that it's amazing that Jenna's head doesn't explode trying to design them all.

Also as part of this launch, intern Devin Kennedy's trigonometry skills are put to good use in creating an easter egg animation triggered when liking or un-liking an item after activating the Konami code.

August 2009: Up until this point, one-click sharing had mainly been for intra-Reader use only (though there were a few third-party uses, some hackier than others). With the launch of Send to (also Devin's work), Reader can now "feed" almost any other service.

February 2010: The launch of Google Buzz posed some interesting questions for the Reader team. Should items shared in Reader show up in Buzz? (yes!) Should we allow separate conversations on an item in Buzz versus Reader? (no!) With a lot of behind the scenes work, sharing and comments in Reader are re-worked to have close ties to Buzz, such that even non-Reader-using friends can finally get in on the commenting action.

March 2010: Partly as a tongue-in-cheek reaction to social developments within Google, and partly to help out some Buzz power users who were complaining that all the social features in Reader were slowing it down, I add a secret (though not for long) anti-social mode.

May 2010: Up until this point, it was possible to have publicly-shared items but only allow certain friends to comment on them. Though powerful, this amount of flexibility was leading to complexity and user confusion and workarounds. To simplify, we switch to offering just two choices for shared items, and in either case if you can see the shared item, you can comment on it.

As you can see, it's been a long trip, and with the switch to Google+ sharing features, Reader is on its fourth social model. This much experimentation in public led to some friction, but I think this incremental approach is still the best way to operate. Whether you're a sharebro, a Reader partier, a Gooder fan, the number 1 sharer or someone who "like"-d someone else, I am are very grateful that you were part of this experiment (and I'm guessing the rest of the past and present team is grateful too). And if you're looking to toast Reader for all its social stumbles accomplishments, the preferred team drink is scotch.

An interesting bug #

As Jonathan has blogged, "What is the hardest bug you've ever tackled?" is an interesting conversation starting point with engineers, one that I often use to start (phone) interviews. I usually rephrase it as "Describe an interesting or difficult bug that you ran into", since "hardest" often causes people to freeze up as they ponder whether the bug they have in mind is actually the hardest. In any case, most bugs become interesting if you ask "why?" enough.

Along these lines, here's a bug that I ran into in mid-2007 while I was working on Google Reader: Soon after a production push, we noticed that some users were complaining that Reader wasn't loading properly when they reloaded the page. Stranger still, others said that it wasn't working properly initially, but after a few reloads it would start working. Checking things in the office revealed similar inconsistent results: Reader would load for some but not for others. For those for whom Reader hadn't loaded successfully, it turned out to be because of a 404 that was returned when trying to load Reader's main JavaScript file.

This happened soon after Gears support was added to Reader, so we initially suspected some interaction with offline support. Perhaps an old version of the HTML was being used by some users, and that contained a link to a version of the JavaScript file that we didn't serve anymore. However, some quick Dremel-ing showed that we had never served the URLs that triggered 404s until the push began. Stranger still, not all requests for those URLs resulted in 404, only about half.

At this point a bit of background about Reader's JavaScript infrastructure is necessary. As previously mentioned, Reader uses the Closure Compiler for processing and minimization of JavaScript. Reader does runtime compilation, since it supports per-user experiments that would make it prohibitive to compile all combinations at build or push time. Instead, when a user requests their JavaScript file, the set of experiments for them is determined, and if we haven't encountered it before, a new variant is compiled and served. JavaScript (and other static resources) are served with a checksum of their contents in the filename. This allows each URL to be served with a far-future cache expiration header, and makes sure that when its content changes users will pick up changes by virtue of having a new URL to fetch.

The JavaScript URL is used in two places, once embedded as a <script src="..."> tag in the HTML, and once when requesting the file itself. The aforementioned compilation and serving steps happen once for each (identical) frontend machine, and some machines had one idea of what the URL should be, and others had a different expectation. Since the frontends are stateless, it was quite likely for users to request the JavaScript from a different one than the one they got the HTML with the URL from. If there was a mismatch, then the 404 would happen. However, if the user reloaded enough times, they would eventually hit a pair of machines that did think the JavaScript URL was the same.

I said the users were getting "seemingly" identical JavaScript, but there was actually a slight difference when doing a diff (which explained the difference in checksums). One variant contained return/^\s*$/.test(str == null ? "" : String(str)) while the other had return/^\s*$/.test((str == null ? "" : String(str))) (note the extra parentheses in the test() argument). The /^\s*$/ regular expression was distinctive enough that it was easy to map this as being the compiled version of the Closure function goog.string.isEmptySafe, which is defined as:

goog.string.isEmptySafe = function(str) {
  return goog.string.isEmpty(goog.string.makeSafe(str));
};

The goog.string.isEmpty and goog.string.makeSafe calls get inlined, hence the presence of the regular expression test and String() directly (note that the implementations may have changed slightly since 2007).

Now that I knew where to look, I began to turn compiler passes off until the output became stable, and it became apparent that the inlining pass itself was responsible. The functions would not be inlined in the same order (i.e. goog.string.isEmpty and then goog.string.makeSafe, or vice-versa), and in one case the the compiler decided to add extra parentheses for safety. Specifically, when inlining the compiler would check to see if the replacement AST node was of lower precedence that the one it was replacing. If it was, a set of parentheses was added to make sure that the meaning was not changed.

The current compiler inlining pass is very different from the one used at that point, but the relevant point here is that the compiler would use a HashSet to keep track of what functions needed to be inlined. The hash set was of Function instances, where Function was a simple class that had a couple of Rhino Node references. Most importantly, it didn't define either equals() or hashCode(), so identity/memory address comparisons and hash code implementations were used.

When actually inlining functions, the compiler pass would iterate through the HashSet, and since the Function instances corresponding to goog.string.isEmpty and goog.string.makeSafe had different addresses depending on the machine, they could be encountered in a different order. The fix was to switch the list of functions to inline to a List (Set semantics were not necessary, especially given that Function instances used identity comparisons so duplicates were not possible).

The inlining compiler pass had used a HashSet for a long time, so I was curious why this only manifested itself then. The explanation turned out to be prosaic: this was the first Reader release where goog.string.isEmptySafe was used, and there were no other places where there were nested inlineable function calls. (This bug happened around the time we switch to JDK6, which changed HashSet internals, but we hadn't actually switched to JDK6 at that point, so it was not involved).

None of this was reproducible when running a frontend locally or in the staging environment, since all those setups have a single frontend instance (they're of very low traffic). In those cases, no matter which version was compiled and which URL was generated, it was guaranteed to be serveable. To prevent the reoccurrence of similar bugs, I added a unit test that compiled Reader's JavaScript locally several times times, and made sure that the output did not change. Though not foolproof, it has caught a couple of other such problems before releases made it out into production.

The main reason why I enjoyed fixing this bug was because it involved non-determinism. However, unlike other non-deterministic bugs that I've been involved in, the triggering conditions were not so mysterious that it took months to solve.

Bloglines Express, or How I Joined The Google Reader Team #

Since Bloglines is shutting down on November 1, I thought it might be a good time to recount how I joined the (nascent) Google Reader team thanks to a Greasmonkey script built on top of Bloglines.

It was the spring of 2005. I had switched from NetNewsWire to Bloglines the previous fall. My initial excitement at being able to get my feed fix anywhere was starting wear off -- Bloglines was held up by some as a Web 2.0 poster child, but the site felt surprisingly primitive compared to contemporary web apps. For example, such a high-volume content consumption product begged for keyboard shortcuts, but the UI was entirely mouse-dependent. I initially started to work on some small scripts to fill in some holes, but fighting with the site's markup was tiring.

I briefly considered building my own feed reader, but actually crawling, storing and serving feed content didn't seem particularly appealing. Then I remembered that Bloglines had released an API a few months prior. The API was meant to be used by desktop apps (NetNewWire, FeedDemon and BlogBot are the initial clients are mentioned in the announcement), but it seemed like it would also work for a web app (the API provided two endpoints, one to get the list subscriptions as OPML, and one to get subscription items as RSS 2.0).

This was also the period when Greasemonkey was really taking off, and I liked the freedom that Greasemonkey scripts provided (piggyback on someone else's site and let them do the hard work, while you can focus on the UI only). However, this was before GM_xmlhttpRequest, so it looked like I'd need a server component regardless, in order to fetch and proxy data from the Bloglines API.

Then, it occurred to me that there was no reason why Greasemonkey had to inject the script into a "real" web page. If I targeted the script at http://bloglines.com/express (which is a 404) and visited that URL, the code that was injected could make same-origin requests to bloglines.com and have a clean slate to work with, letting me build my own feed reading UI.

Once had the basic framework up and running, it was easy to add features that I had wanted:

  • Gmail-inspired keyboard shortcuts.
  • Customized per-item actions, for example for finding Technorati and Feedster backlinks, or posting to Del.icio.us (cf. "send to" in Reader).
  • Specialized views for del.icio.us and Flickr feeds (cf. photo view in Reader).
  • Inline viewing of original page content (including framebuster detection).

A few weeks into this, I saw an email from Steve Goldberg saying that a feed reading project was starting at Google. I got in touch with him about joining the team, and also included a pointer to the script at that state. I don't know if it helped, but it clearly didn't hurt. As it turned out, Chris Wetherell, Jason Shellen, Laurence Gonsalves and Ben Darnell all had (internal) feed reading projects in various states; Reader emerged out of our experiences with all those efforts (Chris has a few more posts about Reader's birth).

Once the Reader effort got off the ground it seemed weird to release a script that was effectively going to be a competitor, so it just sat in my home directory (though Mark Pilgrim did stumble upon it when gathering scripts for Greasemonkey Hacks). However, since Bloglines will only be up for a few more days, I thought I would see if I could resurrect the Greasemonkey script as a Chrome Extension. Happily, it seems to work:

  1. Install this extension (it requires access to all sites since it needs to scrape data from the original blogs
  2. Visit http://bloglines.com/listsubs to enter/cache your HTTP Basic Auth credentials for the Bloglines API.
  3. Visit http://persistent.info/greasemonkey/bloglines-express/ to see your subscriptions (unfortunately I can't inject content into bloglines.com/express since Chrome's "pretty 404" kicks in.

Or if all that is too complicated, here's a screencast demonstrating the basic functionality:

For the curious, I've also archived the original version of the Gresemonkey script (it actually grew to use GM_xmlhttpRequest over time, so that it could load original pages and extra resources).

Somewhat amusingly, this approach is also roughly what Feedly does today. Though they also have a server-side component, at its core is a Firefox/Chrome/Safari extension that makes Google Reader API requests on behalf of the user and provides an alternative UI.

Google Reader Play Bookmarklet #

It occurred to me that it'd be pretty easy to make a bookmarklet for the recently-launched Google Reader Play:

PlayThis!

All it does is take the current page's feed and display it in the Play UI. You may find this useful when discovering a new photo-heavy site (or anything else with a feed, like a Flickr user page), or when you want to share, star or like an item from a site you're not subscribed to (you can also use the regular Reader subscribe bookmarklet for that).

P.S. If you're reading this in a feed/social content reader, you'll most likely have to view the original post, as the javascript: URL on the bookmarklet link will no doubt get sanitized away.

Google Reader and Closure Tools #

Since Google Reader makes heavy use of the recently-open sourced Closure Tools, Louis Gray asked me to give a client's perspective on using them. He wrote up a great post summarizing my thoughts, and if you'd like to see the raw input, I've included it below:


There are three pieces to the Closure Tools, the compiler, the library and the template system. They appeared gradually, roughly in that order. The compiler in its current incarnation dates back to Gmail in 2004 (which is why Paul Buchheit refers to it as the "Gmail JavaScript compiler"), with the library and the template system starting a couple of years later.

Reader development started in early 2005, which meant that we always had the compiler available to us, and so except for early prototypes, we always ran our code through it. Until the last month leading up to the Reader launch in October 2005, the size benefits of the compiler were less important, since we were less focused on download time (and performance in general) and more on getting basic functionality up and running. Instead, the extra checks that the compiler does (e.g. if a function is called with the wrong number of parameters, typos in variable names) made it easier to catch errors much earlier. We have set up our development mode for Reader so that when the browser is refreshed, the JavaScript is recompiled on the server and is used with the page when it is reloaded. This results in a tight development loop that makes it possible to catch JavaScript errors as early as possible.

Since Reader development started before the library and template tools were available, we had homegrown code for doing both. There was actually shared code that did some of the same things as basic library functionality (e.g. a wrapper around getting the size of the window, handling different browser versions and quirks). However, that shared code was of various vintages (copied from project to project) and therefore not very consistent in style or quality. Erik Arvidsson's post talks a bit more about the inception of the Closure library (he's one of the co-creators, along with Dan Pupius).

Reader began using the Closure library and template systems gradually, first for new code and then replacing usages of the old shared library and our homegrown code. It was a gradual process, though I tried to keep it organized by doing an audit of all the usages of old code and their Closure equivalents, so that work could more easily be divided up (this was handled during "fixit" periods, where we focus on code quality more than features).

The benefits of the compiler system are tremendous. The most obvious are the size ones, without it Reader's JavaScript would be 2 megaytes, with it it goes down to 513K, and 184K with gzip (the compiler's output is actually optimized for gzip, since nearly all browsers support it). However, all of the above-mentioned checks, as well as many more that have been added over the past few years (especially type annotations) make it much more manageable to have a large JavaScript codebase that doesn't get out of control as it ages and accumulates features.

The library means that Reader is much less concerned about browser differences, since it tries very hard to hide all those away. Over time, the library has also moved up the UI "stack", going from just basic low level code (e.g. for handling events) to doing UI widgets. This means that it's not a lot of work to do auto-complete widgets, menus, buttons, dialogs, drag-and-drop, etc. in Reader.

One thing to keep in mind is that, as mentioned in the announcement blog post, these tools all started out as 20% projects, and for the most part are still dependent on it. If one project needs a feature from the compiler or the library that doesn't exist, they're encouraged to contribute it, so that other teams can benefit too. To give a specific example, Reader had some home-grown code for locating elements by class name and tag name (a much more rigid and simplified version of the flexible CSS selector-based queries that you can do with jQuery or with the Dojo-based goog.dom.query). As part of the process of "porting" to the Closure library, we realized that though there was an equivalent library function, goog.dom.getElementsByTagNameAndClass, it didn't use some of the more recent browser APIs that could it make it much faster (e.g. getElementsByClassName and the W3C Selector API). Therefore we not only switched Reader's code to use the Closure version, but we also incorporated those new API calls in it. This ended up making all other apps faster; it was very nice to get a message from Dan Pupius saying that the change had shaved off a noticeable amount of time in a common Gmail operation.

You can tell that there's something special about this when you look at the ex-Googlers cheering its release. If it had been some proprietary antiquated system that they had all been forced to use, they wouldn't have been so excited that it was out in the open now :)

If you'd like to know more about Closure, I recommend keeping an eye on Michael Bolin's blog. He already has a few posts about what makes it special, and I'm sure there are more coming.

Exporting likes from Google Reader #

I started this as another protip comment on this FriendFeed thread about Reader likes but it got kind of long, so here goes:

Reader recently launched liking (and a bunch of other features). One of the nice things about liking is that it's completely public*. It would therefore make sense to be pretty liberal with liking data, and in fact Reader does try to expose liking in our feeds. If you look at my shared items feed you will see a bunch of entries like:

<gr:likingUser>00298835408679692061</gr:likingUser>
<gr:likingUser>11558879684172144796</gr:likingUser>
<gr:likingUser>07538649935038400809</gr:likingUser>
<gr:likingUser>09776139491686191852</gr:likingUser>
<gr:likingUser>02408713980432217881</gr:likingUser>
<gr:likingUser>05429296530037195610</gr:likingUser>

These are the users that have liked. Users are represented by their IDs, which you can use to generate Reader shared page URLs. More interestingly, you can plug these into the Social Graph API to see who these users are.

Liking information isn't just limited to Reader shared item feeds. If you use Reader's view of a feed, for example The Big Picture's, you can see the <gr:likingUser> elements there too. This means that as a publisher you can extract this information and see which of your items Reader users find interesting.

For now liking information that is included inline in the feed is limited to 100 users, mainly for performance reasons. That number may go up (or down) as we see how this feature is used. However, if you'd like to get at all of the liker information for a specific item, you can plug in an item ID into the /reader/api/0/likers API endpoint, and then get at it in either JSON or XML formats.

* I've seen some wondering what the difference between liking, sharing and starring is. To some degree that's up to each user, but one nice thing about liking is that it has less baggage associated with it. We learned that if we try to redefine existing behaviors (like sharing) users get upset.

Intern on the Google Reader team #

Having interns has worked out well for the Reader team. Following my blog post, we were very pleased to get Nitin Shantharam and Jason Hall to help us out with Reader development. Their stints on the team resulted in a a bunch of features, and Jason is now back at Google working full-time (Nitin wasn't a slacker, he's just still in school).

We're looking for another intern or two this year. Internships generally last a couple of months to twelve weeks, are for full-time students, and would be in Google's Mountain View, California office. You can work on either Reader's backend (a C++ system for crawling millions of feeds, handling lots of items being read, shared, starred or tagged per second) or frontend (Java servers and JavaScript/AJAX-y craziness) depending on your interests and experience.

If you or anyone you know is interested in this internship, contact me at mihaip at google dot com. This page also has more general information about interning at Google.

Communicating through screenshots #

Intern on the Google Reader team #

The Reader team is hoping to have a student intern or two this coming summer. We're fast moving and always have more ideas than manpower, so an internship can be quite rewarding as far as the "working on real stuff" factor goes. For example, our intern last year, Brad, worked on the subscribe and feed search functionalities of the new Reader that launched last September. You can intern in Google's New York or Mountain View offices, working on either Reader's frontend/UI or backend.

If you or anyone you know is interested in this internship, contact me at mihai at persistent dot info. This page also has more general information about interning at Google.

Understanding feed reader marketshare numbers #

Update on 2/25: FeedBurner has published a post discussing this same issue but providing numbers for their whole userbase, which makes it even more interesting.

Ever since the Reader team announced that we were making public subscriber counts (thanks Justin), bloggers have been excitedly posting about the bumps they're seeing in their subscriber stats. I'm obviously very happy that Reader is getting all this attention, and that we turn out to be quite popular when compared to other feed readers. However, these statistics need a bit of interpretation. Most people post charts of their subscriber counts, like this one for this blog:

FeedBurner subscribers

For web-based readers where feeds are fetched on behalf of multiple users, the subscriber number is based on what the site reports. To the best of my knowledge, with the exception of My Yahoo!, these number are total subscribers, even if an account is inactive. Unless the site is aggressive about cleaning up inactive accounts, these numbers are only upper bounds on the number of actual readers that you have.

A more interesting number to look at is how many viewers each item gets from each feed reader. FeedBurner provides this as part of their TotalStats package. By embedding a small tracking image in your burned posts and looking at referrers, it's possible to see these item-specific views. Here are how many views and clicks my post from yesterday got in various feed readers:

FeedBurner item use

From this it would appear that Reader has an even bigger lead over Bloglines (though given the biases in this blog's readership, I'm not reading too much into this). There are other factors involved here too. The user bases for feed readers are not identical, if an item appeals more to one population than another, that may skew things. Additionally, some readers (especially homepage-style ones like My Yahoo!, Google Personalized Homepage and Netvibes) don't have to display the item body and allow users to jump straight to the post page. These would show up in the "Clicks" column but not in the "Views" one.

What becomes apparent is that none of these statistics provide a complete picture of your readership, but that when used together they can still give you broad trends and help you tailor your content to your audience.

Google Reader Redux #

The new version of Reader has been out there long enough (and is now stable enough) that I have some time to catch my breath and make this post (my post-launch post last year came only a couple of days after the big announcement). I've jotted down some of my thoughts from the past few weeks, continuity will not be high.

There were some hints that something big was coming. Chris's Twitter updates were sounding rather intense. Someone in the discussion group inferred from my lack of posts that a major update was imminent (or that I stopped caring - never!). We even invited some bloggers for a sneak peek at the new Reader* but they were nice and respected their embargo.

Reader is in Google Labs, and that puts it in the "throw it against the wall and see what sticks" product family. I'm glad that people seem to have realized that this "throwing" and "seeing" are less passive than they sound. To stretch this metaphor further, if the spaghetti starts to slide off, engineers (and UI designers, and product managers, and others) will study the problem and figure out how to increase its coefficient of friction. Usually the changes are more subtle (witness the myriad of tweaks that have been done to the Google Video homepage) which is perhaps why there is this perception that no post-launch changes are made.

Gmail and Google Reader integrationA lot of people have remarked on the similarities between the new Reader interface and Gmail's. With this in mind, I've created a simple Greasemonkey script that adds a "Feeds" in Gmail. When clicked, Reader's list view is loaded on the right. To install the script (and Greasemonkey if you have never used it before):

  1. Install Greasemonkey from http://greasemonkey.mozdev.org/
  2. Restart Firefox.
  3. Click on the script link above
  4. Click on the "Install" button that's displayed in the upper-right corner of the page.
  5. Visit/reload Gmail

You may wonder why I felt the need to write a Greasemonkey script for my own product. The answer is that integrations are hard and generally require a lot of effort before you can even determine if they are worthwhile. Greasemonkey lets you experiment with UI concepts with minimal effort necessary from either team (I had to make exactly one change to Reader to better support this script, and that was the ability to force list view to be used, even if expanded view is normally selected). I can't really say what, if any, our integration plans are, but enough users have asked for something like this that I thought writing the script was the most expedient way to provide this (unofficial) feature.

I am still subscribed to the "google reader" Blog Search feed, so that is one way to reach the team with feedback. The discussion group is also being monitored, though with the increased volume we now find it hard to respond to a lot of posts. But please keep the feedback coming, it's been great to get direct, concrete indicators for what we should work on next.

* It is rather frustrating to have to call it "the new Reader" or more formally "the new version of Google Reader." It's unfortunate that version numbers are passé, "2.0.1" is a more accurate and concise representation where of where Reader is right now.

Poor Man's Google Reader Search #

My girlfriend is subscribed to a dozen or so cooking/food blogs (using Google Reader). She stars recipes she's interested in, but since there are so many that catch her eye, she now has over one hundred starred items. Finding a recipe from several months ago among them is not easy, since Reader does not support search (yes, we know).

Since these starred items are shared (so my mom can read them too), I thought that I could use Reader's JSON output to allow at least a primitive type of searching. The result is this filtering UI. You can either plug in a public label name (of the form user/user ID/label/label name) or a feed URL and a search term, and all items with titles containing the term will be displayed.

This hack uses the continuation token that Reader provides, so that more than one chunk of items can be fetched.