The chief technology officer, Farzad Nazem, said that Google now had the exciting sheen of being "the newest kid on the block" and the fastest-growing company. "I know all about that, because that's how it was for us back in '99, 2000," he said. "We sneezed; it was like it was from God. I understand the whole thing."
But Mr. Nazem and everyone else really wanted to discuss what lay beyond these keyword searches of the entire Web. "You can look at the evolution of search as a play in three acts," said Jeff Weiner, the senior vice president for search and marketing. "The first is the 'public' Web, where if different people type the same query they'll all get the same results." The second, he said, was purely personal search - finding a file or photo, usually on your own machine.
"The third is the one that we are very interested in," Mr. Weiner said. This is "social" or "community" searching, in which each attempt to find the right restaurant listing, medical advice site, vacation tip or other bit of information takes advantage of other people's successes and failures in locating the same information.
More at: A Jorney to the Center of Yahoo
November 7, 2005
November 6, 2005
Google Print : 10,000 searchable books
Google completed the first major expansion of its Google Print database of searchable books, adding the full text of more than 10,000 works that are no longer under copyright, culled from the collections of four major research libraries. The additions, from the university libraries at Michigan, Harvard and Stanford and from the New York Public Library, represent the first large group of material to be made available electronically from those libraries, which along with Oxford University contracted with Google last year to let the company scan and make searchable the contents of much or all of their collections.
Google BigTable - multi-dimensional sparse map
Google Reader blog mentions an explanation of BigTable, a Google in-house development to tackle large amounts of data in “semi-structured manner” – such as RSS readers would need. The following notes by Andrew Hitchcock from October 18 2005 are based on a talk given by Google’s Jeff Dean at the University of Washington (they are licensed under a a Creative Commons License).
First an overview. BigTable has been in development since early 2004 and has been in active use for about eight months (about February 2005). There are currently around 100 cells for services such as Print, Search History, Maps, and Orkut. Following Google’s philosophy, BigTable was an in-house development designed to run on commodity hardware. BigTable allows Google to have a very small incremental cost for new services and expanded computing power (they don’t have to buy a license for every machine, for example). BigTable is built atop their other services, specifically GFS, Scheduler, Lock Service, and MapReduce.
Each table is a multi-dimensional sparse map. The table consists of rows and columns, and each cell has a time version. There can be multiple copies of each cell with different times, so they can keep track of changes over time. In his examples, the rows were URLs and the columns had names such as “contents:” (which would store the file data) or “language:” (which would contain a string such as “EN”).
In order to make each manage the huge tables, the tables are split at row boundaries and saved as tablets. Tablets are each around 100-200 MB and each machine stores about 100 of them (they are stored in GFS). This setup allows fine grain load balancing (if one tablet is receiving lots of queries, it can shed other tablets or move the busy tablet to another machine) and fast rebuilding (when a machine goes down, other machines take one tablet from the downed machine, so 100 machines get new tablet, but the load on each machine to pick up the new tablet is fairly small).
Tablets are stored on systems as immutable SSTables and a tail of logs (one log per machine). When system memory is filled, it compacts some tablets. He went kind of fast through this, so I didn’t have time to write everything down, but here is the overview: There are minor and major compactions. Minor compactions involve only a few tablets, while major ones involve the whole system. Major compactions can reclaim hard disk space. The location of the tablets are actually stored in special BigTable cells. The lookup is a three-level system. The clients get a pointer to the META0 tablet (there is only one). This tablet is heavily used, and so one machine usually ends up shedding all its other tablets to support the load. The META0 tablet keeps track of all the META1 tablets. These tables contain the location of the actual tablet being looked up. There is no big bottleneck in the system, because they make heavy use of pre-fetching and caching.
Back to columns. Columns are in the form of “family:optional_qualifier”. In his example, the row “www.cnn.com” might have the columns “contents:” with the HTML of the page, “anchor:cnn.com/news” with the anchor text of that link (“CNN Homepage”), and “anchor:stanford.edu/” with that anchor text (“CNN”). Columns have type information. Columns families can have attributes/rules that apply to their cells, such as “keep n time entries” or “keep entries less than n days old”. When tablets are rebuilt, these rules are applied to get rid of any expired entries. Because of the design of the system, columns are easy to create (and are created implicitly), while column families are heavy to create (since you specify things like type and attributes). In order to optimize access, column families can be split into locality groups. Locality groups cause the columns to be split into different SSTables (or tablets?). This increases performance because small, frequently accessed columns can be stored in a different spot than the large, infrequent columns.
All the tablets on one machine share a log; otherwise, one million tablets in a cluster would result in way too many files opened for writing (there seems to be a discrepancy here, he said 100 tablets per machine and 1000 machines, but that doesn’t equal one million tablets). New log chunks are created every so often (like 64 MB, which would correspond with the size of GFS chunks). When a machine goes down, the master redistributes its log chunks to other machines to process (and these machines store the processed results locally). The machines that pick up the tablets then query the master for the location of the processed results (to update their recently acquired tablet) and then go directly to the machine for their data.
There is a lot of redundant data in their system (especially through time), so they make heavy use of compression. He went kind of fast and I only followed part of it, so I’m just going to give an overview. Their compression looks for similar values along the rows, columns, and times. They use variations of BMDiff and Zippy. BMDiff gives them high write speeds (~100MB/s) and even faster read speeds (~1000MB/s). Zippy is similar to LZW. It doesn’t compresses as highly as LZW or gzip, but it is much faster. He gave an example of a web crawl they compressed with the system. The crawl contained 2.1B pages and the rows were named in the following form: “com.cnn.www/index.html:http”. The size of the uncompressed web pages was 45.1 TB and the compressed size was 4.2 TB, yielding a compressed size of only 9.2%. The links data compressed to 13.9% and the anchors data compressed to 12.7% the original size.
They have their eye on the future with some features under consideration. 1. Expressive data manipulation, including having scripts sent to clients to modify data. 2. Multi-row transaction support. 3. General performance for larger cells. 4. BigTable as a service. It sounds like each service (such as Maps or Search History) have their own cluster running BigTable. They are considering running a Google-wide BigTable system, but that would require fairly splitting resources and compute time, etc.
Google Talk video plugin
Google Talk, Google’s messaging and VOIP application, is now Video enabled with a new plug-in from Festoon. Using the video plug-in, Google Talk users can now see who they are talking to, in real time. Festoon, which also has the most popular video plug-in for Skype with more than 2.75 million downloads, allows Google Talk users to talk to and see each other, play games, share pictures, or conducting business.
By installing the Festoon Google Talk Video Plug-in, Google Talk users can securely conduct video voice calls in groups from 2 to 200 and share photos, spreadsheets, presentations, or applications with others on a call. Festoon offers high-quality group voice utilizing the industry leading GIPS audio codec used also by Google Talk, Skype, and AIM Triton. Festoon’s Video plug-ins are advertiser supported.
By installing the Festoon Google Talk Video Plug-in, Google Talk users can securely conduct video voice calls in groups from 2 to 200 and share photos, spreadsheets, presentations, or applications with others on a call. Festoon offers high-quality group voice utilizing the industry leading GIPS audio codec used also by Google Talk, Skype, and AIM Triton. Festoon’s Video plug-ins are advertiser supported.
November 4, 2005
Google AdSense Referral program: one dollar for a new Firefox user
Google launched AdSense referral program. Referrals is a feature of AdSense that allows you to increase your revenue, while increasing your users' awareness of useful products and services. By adding a referral button to your site, you can direct users to products like AdSense and Firefox + Google Toolbar. When your referral connects a user to AdSense or Firefox, you can generate more earnings while helping new web publishers monetize their websites or improve their web browsing experience.
When a user you've referred to AdSense first earns US $100, Google will credit your AdSense account with US $100. When a user you've referred to Firefox plus Google Toolbar runs Firefox for the first time, you'll receive up to $1 in your account, depending on the user's location.
When a user you've referred to AdSense first earns US $100, Google will credit your AdSense account with US $100. When a user you've referred to Firefox plus Google Toolbar runs Firefox for the first time, you'll receive up to $1 in your account, depending on the user's location.
Bill Gates on the future of the PC
Computer users are increasingly turning to new devices for accessing key applications and information. PDAs, mobile phones, even digital TV, are all changing attitudes towards the ubiquitous PC. But, not surprisingly, Microsoft chairman Bill Gates still sees the PC – albeit a very different one – as the future.
"The PC will be able to recognise speech, you will be able to use ink with it, and it will have a camera capability so it can see what is going on," he says.
"The tablet form factor will be something you just take with you to meetings. There is a lot to be done. The PC will be a phenomenal device compared with what it is now."
More at: Bill Gates exclusive interview.
"The PC will be able to recognise speech, you will be able to use ink with it, and it will have a camera capability so it can see what is going on," he says.
"The tablet form factor will be something you just take with you to meetings. There is a lot to be done. The PC will be a phenomenal device compared with what it is now."
More at: Bill Gates exclusive interview.
November 3, 2005
Google Desktop 2 out of beta
Google Desktop 2 and Google Desktop for Enterprise have emerged from beta testing and can now be considered full-fledged software. In a sure sign that the search software has finally arrived, Google Desktop has its own blog.
Google Desktop 2, a free downloadable application, combines desktop search and the Google Sidebar, a floating tool palette that offers personalized news and other information based on a user's habits.
Dozens of new third-party Sidebar panels are now available like iTunes, Winamp control and an American Express panel to track and view credit card transactions in real time. The new software can also display maps related to the sites one visits while surfing the Net.
In addition, developers can use simple JavaScript to write plug-ins for Google Desktop 2.

Google Desktop 2, a free downloadable application, combines desktop search and the Google Sidebar, a floating tool palette that offers personalized news and other information based on a user's habits.
Dozens of new third-party Sidebar panels are now available like iTunes, Winamp control and an American Express panel to track and view credit card transactions in real time. The new software can also display maps related to the sites one visits while surfing the Net.
In addition, developers can use simple JavaScript to write plug-ins for Google Desktop 2.


November 2, 2005
Web 2.0

Great O'Reilly article that tries to define Web 2.0.
One of the most highly touted features of the Web 2.0 era is the rise of blogging. Personal home pages have been around since the early days of the web, and the personal diary and daily opinion column around much longer than that, so just what is the fuss all about?
At its most basic, a blog is just a personal home page in diary format. But as Rich Skrenta notes, the chronological organization of a blog "seems like a trivial difference, but it drives an entirely different delivery, advertising and value chain."
One of the things that has made a difference is a technology called RSS. RSS is the most significant advance in the fundamental architecture of the web since early hackers realized that CGI could be used to create database-backed websites. RSS allows someone to link not just to a page, but to subscribe to it, with notification every time that page changes. Skrenta calls this "the incremental web." Others call it the "live web".
Windows Live on Live.com
Bill Gates announced Windows Live and Office Live. Windows Live homepage looks underwhelming in Internet Explorer, and is broken in Firefox.
So now on Live.com we have weather forecasts, stock quotes, email, horoscopes, an RSS reader. You can drag & drop and expand & collapse sections to your liking, and if you’re logged in, you can also see your new Hotmail messages right on the front-page .
As soon as we see MS Word, MS Excel, and MS PowerPoint as actual working (D)HTML pages (Q1 of 2006), this could become interesting.



More at: Crunchnotes.com
So now on Live.com we have weather forecasts, stock quotes, email, horoscopes, an RSS reader. You can drag & drop and expand & collapse sections to your liking, and if you’re logged in, you can also see your new Hotmail messages right on the front-page .
As soon as we see MS Word, MS Excel, and MS PowerPoint as actual working (D)HTML pages (Q1 of 2006), this could become interesting.



More at: Crunchnotes.com
November 1, 2005
Google tweaks OpenOffice
OpenOffice has its roots in Sun Microsystems' StarOffice suite of programs. Five years ago, Sun turned its proprietary software into an open-source project. Only recently, however, has the competitor to Microsoft's Office attracted serious attention.
Now Google believes it can help OpenOffice--perhaps working to pare down the software's memory requirements or its mammoth 80MB download size, said Chris DiBona, manager for open-source programs at the search company.
Google's manager for open-source programs Chris DiBona told news.com: "We want to hire a couple of people to help make OpenOffice better."
Now Google believes it can help OpenOffice--perhaps working to pare down the software's memory requirements or its mammoth 80MB download size, said Chris DiBona, manager for open-source programs at the search company.
Google's manager for open-source programs Chris DiBona told news.com: "We want to hire a couple of people to help make OpenOffice better."
Subscribe to:
Posts (Atom)