Google Hacks, 2nd Edition
By Tara Calishain, Rael Dornfest
Publisher : O'Reilly
Table of
• Pub Date : December 2004
Contents
• Index ISBN : 0-596-00857-0
• Review s Pages : 479
Reader
•
Review s
• Errata
• Academic
Copyright
Dedication
Forew ord
Forew ord to the First Edition
Credits
About the Authors
Contributors
Acknow ledgments
Preface
Why Google Hacks?
How This Book Is Organized
How to Use This Book
How to Run the Hacks
Where to Go for More
Conventions
Using Code Examples
Safari Enabled
How to Contact Us
Got a Hack?
Chapter 1. Web
Hacks 1-20
Section 1.2. Google Web Search Basics
Section 1.3. Full-Word Wildcards
Section 1.4. The 10-Word Limit
Section 1.5. Special Syntax
Section 1.6. Mixing Syntax
Section 1.7. Advanced Search
Section 1.8. Quick Links
Section 1.9. Language Tools
Section 1.10. Anatomy of a Search Result
Section 1.11. Setting Preferences
Section 1.12. Understanding Google URLs
Hack 1. Brow se the Google Directory
Hack 2. Glean a Snapshot of Google in Time
Hack 3. Graph Google Results over Time
Hack 4. Visualize Google Results
Hack 5. Check Your Spelling
Hack 6. Google Phonebook: Let Google's Fingers Do
the Walking
Hack 7. Think Global, Google Local
Hack 8. Track Stocks
Hack 9. Consult the Dictionary
Hack 10. Look Up Definitions
Hack 11. Search Article Archives
Hack 12. Find Directories of Information
Hack 13. Seek Out Weblog Commentary
Hack 14. Cover Your Bases
Hack 15. Repetition Matters
Hack 16. Search a Particular Date Range
Hack 17. Calculate Google Centuryshare
Hack 18. Hack Your Ow n Google Search Form
Hack 19. Go Beyond Google's Advanced Search
Hack 20. Use Google Tools for Translators
Chapter 2. Advanced Web
Hacks 21-49
Section 2.2. Assumptions
Hack 21. Like a Version
Hack 22. Capture Google Results in a Google Box
Hack 23. Build Google Directory URLs
Hack 24. Find Recipes
Hack 25. Track Result Counts over Time
Hack 26. Feel Really Lucky
Hack 27. Get Random Results (on Purpose)
Hack 28. Permute a Query
Hack 29. Weight a Query Keyw ord
Hack 30. Restrict Searches to Top-Level Results
Hack 31. Search for Special Characters
Hack 32. Dig Deeper into Sites
Hack 33. Summarize Results by Domain
Hack 34. Measure Google Mindshare
Hack 35. SafeSearch Certify URLs
Hack 36. Search Google Topics
Hack 37. Find the Largest Page
Hack 38. Perform Proximity Searches
Hack 39. Meander Your Google Neighborhood
Hack 40. Run a Google Popularity Contest
Hack 41. Scrape Yahoo! Buzz for a Google Search
Hack 42. Compare Google's Results w ith Other
Search Engines
Hack 43. Scattersearch w ith Yahoo! and Google
Hack 44. Yahoo! Directory Mindshare in Google
Hack 45. Glean Weblog-Free Google Results
Hack 46. Spot Trends w ith Geotargeting
Hack 47. Bring the Google Calculator to the
Command Line
Hack 48. Build a Custom Date Range Search Form
Hack 49. Search Yesterday's Index
Chapter 3. Images
Hacks 50-53
Section 3.2. Google Images Advanced Search
Interface
Section 3.3. Google Images Search Syntax
Hack 50. Borrow a Corporate or Product Logo
Hack 51. Brow se the World Wide Photo Album
Hack 52. Google Cartography: Street Art in Your
Neighborhood
Hack 53. Capture the Map
Chapter 4. New s and Groups
Hakcs 54-58
Section 4.2. Google New s
Section 4.3. Google New s Search Syntax
Section 4.4. Advanced New s Search
Section 4.5. Making the Most of Google New s
Section 4.6. Receive Google New s Alerts
Section 4.7. Beyond Google for New s Search
Hack 54. Scrape Google New s
Hack 55. Visualize Google New s
Section 4.10. Google Groups
Section 4.11. 10 Seconds of Hierarchy Funk
Section 4.12. Brow sing Groups
Section 4.13. Google Groups Search Syntax
Section 4.14. Advanced Groups Search
Hack 56. Go Deeper into Groups w ith Google
Groups 2
Hack 57. Scrape Google Groups
Hack 58. Simplify Google Groups URLs
Chapter 5. Add-Ons
Hacks 59-70
Hack 59. Keep Tabs on Your Searches w ith Google
Alerts
Hack 60. Add Google to Your Toolbar or Desktop
Hack 61. Google Your Desktop
Hack 62. Google w ith Bookmarklets
Hack 63. Google from Word
Hack 64. Google by Email
Hack 65. Google by Instant Messenger
Hack 66. Google from IRC
Hack 67. Google on the Go
Hack 68. Visit the Google Labs
Hack 69. Find Out What Google Thinks ___ Is
Hack 70. The Search Engine Belt Buckle
Chapter 6. Gmail
Hacks 71-80
Section 6.2. Gmail Search Syntax
Section 6.3. Additional Resources
Hack 71. Glean a Gmail Invite
Hack 72. Create and Use Custom Addresses
Hack 73. Import Your Contacts into Gmail
Hack 74. Import Mail into Gmail
Hack 75. Export Your Gmail
Hack 76. Take a Walk on the Lighter Side
Hack 77. Gmail on the Go
Hack 78. Use Gmail as a Linux Filesystem
Hack 79. Use Gmail as a Window s Drive
Hack 80. Program Gmail
Chapter 7. Ads
Hacks 81-85
Section 7.2. Google AdSense
Section 7.3. Google AdWords
Hack 81. Get the Most out of AdWords
Hack 82. Generate Google AdWords
Hack 83. Scrape Google AdWords
Hack 84. Determine the Worth of AdWords Words
Hack 85. Serve Backup Ads
Chapter 8. Webmastering
Hacks 86-91
Section 8.2. Google's Importance to Webmasters
Section 8.3. The Mysterious PageRank
Section 8.4. The Equally Mysterious Ranking
Algorithm
Section 8.5. Keeping Up w ith Google's Changes
Section 8.6. In a Word: Relax
Hack 86. A Webmaster's Introduction to Google
Hack 87. Get Inside the PageRank Algorithm
Hack 88. 26 Steps to 15K a Day
Hack 89. Be a Good Search Engine Citizen
Hack 90. Clean Up for a Google Visit
Hack 91. Remove Your Materials from Google
Chapter 9. Programming Google
Hacks 92-100
Section 9.2. Signing Up and Google's Terms
Section 9.3. The Google Web APIs Developer's Kit
Section 9.4. Using Your Google API Key
Section 9.5. What's WSDL?
Section 9.6. Understanding the Google API Query
Section 9.7. Understanding the Google API
Response
Section 9.8. A Note on Spidering and Scraping
Hack 92. Program Google in Perl
Hack 93. Install the SOAP::Lite Perl Module
Hack 94. Program Google w ith the Net::Google Perl
Module
Hack 95. Loop Around the 10-Result Limit
Hack 96. Program Google in PHP
Hack 97. Program Google in Java
Hack 98. Program Google in Python
Hack 99. Program Google in C# and .NET
Hack 100. Program Google in VB.NET
Colophon
Index
C opyright © 2 0 0 5 O 'Reilly M edia, I nc . A ll rights res erved.
P rinted in the U nited States of A meric a.
P ublis hed by O 'Reilly M edia, I nc ., 1 0 0 5 G ravens tein H ighway
N orth, Sebas topol, C A 9 5 4 7 2 .
O 'Reilly books may be purc has ed for educ ational, bus ines s , or
s ales promotional us e. O nline editions are als o available for
mos t titles (http://s afari.oreilly.c om). For more information,
c ontac t our c orporate/ins titutional s ales department: (8 0 0 )
9 9 8 - 9 9 3 8 or c orporate@ oreilly.c om.
N uts hell H andbook, the N uts hell H andbook logo, and the
O 'Reilly logo are regis tered trademarks of O 'Reilly M edia, I nc .
T he H ac ks s eries des ignations , Google Hacks , the image of
loc king pliers , and related trade dres s are trademarks of O 'Reilly
M edia, I nc . A ll other trademarks are property of their res pec tive
owners .
G oogle, P ageRank, A dSens e, A dWords , G mail, and I 'm Feeling
L uc ky are trademarks of G oogle Tec hnology, I nc .
M any of the des ignations us ed by manufac turers and s ellers to
dis tinguis h their produc ts are c laimed as trademarks . Where
thos e des ignations appear in this book, and O 'Reilly M edia, I nc .
was aware of a trademark c laim, the des ignations have been
printed in c aps or initial c aps .
While every prec aution has been taken in the preparation of this
book, the publis her and authors as s ume no res pons ibility for
errors or omis s ions , or for damages res ulting from the us e of the
information c ontained herein.
Dedication
To our Grannies : Olivia and Miriam
Foreword
When the firs t edition of Google Hacks appeared, we were frankly a
bit s urpris ed that there was enough hacking going on to make up
a whole book. N o longer. P eople c ontinue to dis c over more ways
than we ever imagined to tweak, tone, and otherwis e futz around
with G oogle bits for myriad us es .
I n the 1 8 months s inc e Google Hacks firs t appeared, s earc h has ,
if pos s ible, only grown in importanc e. N ot only is there more
information than ever to be foundv ia email, c omputer hard
drives , and newly digitized repos itories of previous ly offline
c ontentt here is als o a greater need to automate tas ks and to
loc ate that needle of information in a hays tac k that jus t will not
s top growing.
We hope that you enjoy this new Google Hacks effort and c ontinue
to help us make the mos t of the world's information by making it
univers ally ac c es s ible and us eful.
--Craig Silvers tein, Director of Technology, Google
Foreword to the First Edition
When we s tarted G oogle, it was hard to predic t how big it would
bec ome. T hat our s earc h engine would s omeday s erve as a
c atalys t for s o many important web developments was a dis tant
dream. We are honored by the growing interes t in G oogle and
offer many thanks to thos e who c reated this bookt he larges t and
mos t c omprehens ive report on G oogle s earc h tec hnology that
has yet to be publis hed.
Searc h is an amazing field of s tudy, bec aus e it offers infinite
pos s ibilities for how we might find and make information
available to people. We join with the authors in enc ouraging
readers to approac h this book with a view toward dis c overing
and c reating new ways to s earc h. G oogle's mis s ion is to
organize the world's information and make it univers ally
ac c es s ible and us eful, and we welc ome any c ontribution you
make toward ac hieving this goal.
H ac king is the c reativity that fuels the Web. A s s oftware
developers ours elves , we applaud this book for its adventurous
s pirit. We're adventurous , too, and were happy to dis c over that
this book highlights many of the s ame experiments we c onduc t
on our free time here at G oogle.
G oogle is c ons tantly adapting its s earc h algorithms to matc h
the dynamic growth and c hanging nature of the Web. A s you
read, pleas e keep in mind that the examples in this book are
valid today but, as G oogle innovates and grows over time, may
bec ome obs olete. We enc ourage you to follow the lates t
developments and to partic ipate in the ongoing dis c us s ions
about s earc h as fac ilitated by books s uc h as this one.
V irtually every engineer at G oogle has us ed an O 'Reilly
public ation to help them with their jobs . O 'Reilly books are a
s taple of the G oogle engineering library, and we hope that Google
Hacks will be as us eful to others as the O 'Reilly public ations
have been to G oogle.
With the larges t c ollec tion of web doc uments in the world,
G oogle is a reflec tion of the Web. T he hac ks in this book are not
jus t about G oogle, they are als o about unleas hing the vas t
potential of the Web today and in the years to c ome. Google Hacks
is a great res ourc e for s earc h enthus ias ts , and we hope you
enjoy it as muc h as we did.
T hanks ,
- - The Google Engineering Team
D ec ember 1 1 , 2 0 0 2
M ountain V iew, C alifornia
Credits
A bout the A uthors
C ontributors
A c knowledgments
About the Authors
Tara C alis hain is the editor of Res earc hBuzz
(http://www.res earc hbuzz.c om), a weekly news letter on I nternet
s earc hing and online information res ourc es . She's als o a regular
c olumnis t for Searcher magazine. She's been writing about s earc h
engines and s earc hing s inc e 1 9 9 6 ; her rec ent books inc lude
Web Search Garage.
Rael D ornfes t is C hief Tec hnology O ffic er at O 'Reilly M edia. H e
as s es s es , experiments , programs , fiddles , fidgets , and writes for
the O 'Reilly N etwork and various O 'Reilly public ations . Rael is
Series E ditor of the O 'Reilly H ac ks s eries
(http://hac ks .oreilly.c om) and has edited, c ontributed to, and
c oauthored various O 'Reilly books , inc luding Mac OS X Panther
Hacks , Mac OS X Hacks , the Google Pocket Guide, Google: The
Mis s ing Manual, Es s ential Blogging, and Peer to Peer: Harnes s ing
the Power of Dis ruptive Technologies . H e is als o P rogram C hair for
the O 'Reilly E merging Tec hnology C onferenc e
(http://c onferenc es .oreilly.c om/etec h). I n his c opious free time,
Rael develops bits and bobs of freeware, partic ularly the
Blos xom weblog applic ation (http://www.blos xom.c om), is E ditor
in C hief of M obileWhac k (http://www.mobilewhac k.c om), and
(more often than not) maintains his Raelity Bytes weblog
(http://www.raelity.org).
Contributors
T he following people c ontributed their hac ks , writing, and
ins piration to this book:
T im A llwine is a Senior Software E ngineer at O 'Reilly
M edia. H e develops s oftware for the M arket Res earc h
groupv arious s pidering tools that c ollec t data from
dis parate s ites a nd is involved in the development of web
s ervic es at O 'Reilly.
D J A dams (http://www.pipetree.c om/qmac ro) is an SA P
hac ker who pines for the days when he wrote job c ontrol
language and S/3 7 0 as s embler and got around c entral
L ondon on his s kateboard. C urrently, he is knee- deep in
N etWeaver tec hnologies and us es up s pare brain c yc les
playing with RE ST, RD F, and J abber. H e wrote O 'Reilly's
Programming Jabber: Extending XML Mes s aging and
c owrote Google Pocket Guide, als o from O 'Reilly. H e lives
in E urope with Sabine and J os eph.
A vaQ ues t (http://www.avaques t.c om) is a
M as s ac hus etts - bas ed I T s ervic es firm that s pec ializes
in applying advanc ed information retrieval,
c ategorization, and text mining tec hnologies to s olve
real- world problems . G oogleP eople and G oogleM ovies ,
c reated by A vaQ ues t c ons ultants N athan Treloar, Sally
Kleinfeldt, and P eter Ric hards , c ame out of a web mining
c ons ulting projec t the team worked on in the s ummer of
2 0 0 2 , s hortly after the G oogle Web A P I was announc ed.
P aul Baus c h (http://www.onfoc us .c om) is a freelanc e
web developer and author living in O regon. H e was a
c oc reator of the Blogger weblog s oftware and rec ently
c owrote a book about weblogs c alled We Blog: Publis hing
Online with Weblogs . H e believes (like G oogle) that
"love" (7 5 ,7 0 0 ,0 0 0 ) will c onquer "hate" (7 ,9 0 0 ,0 0 0 ).
E rik Bens on (http://www.erikbens on.c om).
J us tin Blanton (http://jus tinblanton.c om) has a B.S. in
c omputer engineering and is c urrently attending law
s c hool in Silic on Valley, where he is foc us ing on
intellec tual property law and will likely prac tic e both
patent pros ec ution and litigation. M uc h of his "free time"
is s pent writing about various things on his web s ite,
inc luding M ac O S X, mobile phones and other gadgets ,
general tips and tric ks for the M ovable Type C M S, and
life in general.
C apeSc ienc e.c om (http://www.c apes c ienc e.c om) is the
development c ommunity for C ape C lear Software, a web
s ervic es c ompany. I n addition to providing s upport for
C ape C lear's produc ts , C apeSc ienc e makes all s orts of
fun web s ervic es s tuff, inc luding live s ervic es , c lients to
other s ervic es , utilities , and other geekware.
A ntoni C han (http://www.alltooflat.c om) is one of the
founders of A ll Too Flat, a bas tion of quirky c ontent,
pranks , and geeky humor. T he G oogle M irror is a 2 ,5 0 0 -
line C G I s c ript that was developed over the period of a
year s tarting in O c tober 2 0 0 1 . When not working on his
web s ite, he enjoys playing mus ic , bowling, and running
after a Fris bee.
Tanya H arvey C iampi (http://www.multilingual.c h) grew
up in Buc kinghams hire, E ngland, and went on to s tudy in
Zuric h, where s he obtained her diploma in trans lation.
She now lives in T ic ino, the I talian- s peaking region of
Switzerland, where s he works as an E nglis h tec hnic al
trans lator (from I talian, G erman, and Frenc h) and
proofreader, and teac hes trans lation and I nternet s earc h
tec hniques bas ed on her WWW Searc h I nterfac es for
Trans lators . I n her free time, s he enjoys fis hing with her
father on the wes t c oas t of I reland, writing poems , and
playing C eltic mus ic .
P eter D rayton (http://www.razors oft.net/weblog/) is a
program manager in the C L R team at M ic ros oft. Before
joining M ic ros oft, he was an independent c ons ultant,
trainer for D evelopM entor, and author of C# Es s entials
and C# in a Nuts hell (O 'Reilly).
A ndrew Flegg (http://www.bleb.org) works for I BM in the
U .K., having graduated from the U nivers ity of Warwic k a
few years ago. H e's c urrently the webmas ter of H urs ley
L ab's intranet s ite. M os t of his work (and fun) at the
moment is taken up with P erl, J ava, H T M L , and C SS.
A ndrew is partic ularly keen on c lean, reus able c ode,
whic h always ends up s aving time in the long run. H e's
written s everal open s ourc e projec ts , as well as a c ouple
of c ommerc ial applic ations for RI SC O S (as us ed in the
I yonix P C , the firs t des ktop c omputer us ing an I ntel
XSc ale).
A ndrew G oodman (http://www.page- zero.c om) is founder
and princ ipal of P age Zero M edia, whic h helps c lients
perform better on paid s earc h c ampaigns . H e blogs his
thoughts regularly as E ditor- at- L arge of Traffic k.c om, a
c ontrarian's guide to s earc h engines and portals . Fortune
Small Bus ines s , The Was hington Pos t, New Media Age, The
New York Times , Bloomberg Markets , Bus ines s Week,
Reuters , The National Pos t, CBS Marketwatch, Forbes , and
numerous other bus ines s public ations have s ought his
views on s earc h advertis ing. H e is author of Winning
Res ults with Google AdWords (M c G raw- H ill, late 2 0 0 4 ).
O ne of his favorite G oogle hac ks is G ooP oetry.
Kevin H emenway (http://www.dis obey.c om), better
known as M orbus I ff, is the c reator of dis obey.c om, whic h
bills its elf as "c ontent for the dis c ontented." P ublis her,
developer, and writer of more home c ooking than you
c ould ever imagine (like the popular open s ourc e
s yndic ated reader A mphetaD es k, the bes t- kept gaming
s ec ret G amegrene.c om, the popular G hos t Sites and
N ons ens e N etwork, the giggle- induc ing artic les at the
O 'Reilly N etwork, a few piec es at A pple's I nternet
D eveloper s ite, etc .), he's an ardent s upporter of c loning
merely s o he c an get more work done. H e c ooks with a
Fry P an of I ntellec t +2 and lives in C onc ord, N ew
H amps hire.
M ark H orrell (http://www.markhorrell.c om) has worked in
s earc h engine optimization s inc e 1 9 9 6 when he joined
N et Res ourc es I nternational, a publis her of indus trial
engineering web s ites , where he c onc eived and
developed the c ompany's I nternet marketing s trategy.
H e left in 2 0 0 2 and is now a freelanc e web developer
bas ed in L ondon, U .K., s pec ializing in s earc h engine-
friendly des ign.
J udy H ourihan (http://judy.hourihan.c om).
L eland J ohns on (http://protoplas mic .org) is c urrently a
s tudent at I llinois I ns titute of Tec hnology. H e tried
learning P erl in 1 9 9 9 , then tried again and was
s uc c es s ful in 2 0 0 1 , and now us es it for everything
exc ept his c las s es . When he's not bus ied by his
c las s es , he updates his weblog, explores C hic ago, and
plays far too many video games .
Steven J ohns on (http://www.s tevenberlinjohns on.c om/)
is the author of two books , Emergence and I nterface
Culture. H e c oc reated the s ites FE E D and P las tic .c om,
and now blogs regularly at
http://www.s tevenberlinjohns on.c om. H e writes the
monthly "E merging Tec hnology" c olumn for Dis cover
magazine, and his work has appeared in many
public ations , inc luding The New York Times , Harper's ,
Wired, and The New Yorker. H e lives in Brooklyn, N ew
York.
Ric hard J ones (http://ric hard.jones .name) has s pent the
las t four years working as a s oftware engineer for A gent
O riented Software (http://www.agent- s oftware.c om).
A O S develops a leading intelligent agent development
platform known as J A C K I ntelligent A gents . Before A O S,
he worked as a s oftware engineer for Senate Software (a
s mall s earc h tec hnology c ompany), where he developed
web page relevanc e heuris tic s . Before that, Ric hard was
a c ofounder of E arthmen Tec hnology, whic h developed
network intrus ion detec tion tec hnologies . A t E arthman,
he was res pons ible for a majority of the development,
whic h inc luded low- level T C P /I P networking c ode, L inux
kernel hac king, and fas t- pattern matc hing algorithms .
H e has two degrees , one in c omputer s c ienc e and
another in c ognitive s c ienc e, both from L aTrobe
U nivers ity (http://www.latrobe.edu.au). While in s c hool,
Ric hard majored in c omputer s c ienc e, linguis tic s , and
ps yc hology, areas he retains a keen interes t in. Ric hard
is als o a s quas h- playing Buddhis t.
Stuart L angridge (http://www.kryogenix.org) gets paid to
hac k on the Web during the day, and does it for free at
night when he's not arguing about Buffy or D ebian
G N U /L inux. H e's keen on web s tandards , P ython, and
s trange things you c an do with J avaSc ript, all of whic h
c an be s een at his web s ite and weblog. H e's als o
s lightly s urpris ed that the G oogle A rt C reator, whic h was
an amus ing little hac k done in a day, is the mos t popular
thing he's ever written and got him into a book.
Beau L ebens (http://www.dentedreality.c om.au) is a P H P
web developer who believes that even c omplex s ys tems
c an be made s imple for an end us er. O riginally from
P erth, Wes tern A us tralia, he is c urrently working in
H awaii. H e has releas ed a number of projec ts on his web
s ite, inc luding webpad, the web- bas ed text editor;
A vantBlog, a P alm/P oc ket P C Blogging applic ation; and
the P H P Blogger A P I , whic h provides P H P developers
with ac c es s to the Blogger A P I . Beau is a big believer in
s impler, dis tributed tec hnologies like A tom, RE ST, and
RSS for the future of the Web.
P hilipp L ens s en (http://blog.outer- c ourt.c om) was born
in 1 9 7 7 and c urrently lives in Stuttgart, G ermany. H e's
working as developer on the web s ites of a popular
G erman c ar maker. P revious ly, he s pent 9 months living
in M alays ia and prefers to eat very s pic y. I n his
s paretime, P hilipp is the author behind the daily G oogle
Blogos c oped (a weblog c overing G oogle, online
res earc h, and internet fun in general), trying to c rac k his
head on how to tap the web c ons c ious nes s .
M ark Lyon (http://marklyon.org) is the c reator of the
G oogle G M ail L oader. A former programmer for the U .S.
A rmy C orps of E ngineers , he gave up his as pirations of
programming greatnes s after an uns uc c es s ful interview
at G oogle. H e is now a law s tudent at M is s is s ippi
C ollege in J ac ks on, M is s is s ippi, with plans to prac tic e
intellec tual property and tec hnology law. I n his s pare
time, he writes novel but medioc re s oftware in whatever
language s trikes his fanc y.
P aul M utton (http://www.jibble.org) c urrently works for
N etc raft in the U .K. H e graduated with firs t- c las s honors
in c omputer s c ienc e, winning the I E E I ns titution P rize
for being the bes t overall s tudent in his department. H e
us es G oogle on a daily bas is and I nternet Relay C hat
(I RC ) to c ollaborate with fellow P h.D . s tudents in other
c ountries . I n his remaining s pare time, he us es his Sun
C ertified J ava P rogrammer s kills to develop all s orts of
open s ourc e s oftware on his pers onal web s ite
(http://www.jibble.org). Some of his res earc h has
c ulminated in the c reation of the popular P ieSpy
applic ation (http://www.jibble.org/pies py), whic h infers
and vis ualizes s oc ial networks on I RC and even
appeared on Slas hdot onc e. H e c an normally be found
jibbling around in #jibble and #irc hac ks on the freenode
I RC network with the nic kname J ibbler, or P aul on
s maller networks .
M ark P ilgrim (http://diveintomark.org) is the author of
Dive I nto Python, a free P ython book for experienc ed
programmers , and Dive I nto Acces s ibility, a free book on
web ac c es s ibility tec hniques . H e works for M as s L ight, a
Was hington, D .C .- bas ed training and web development
c ompany, where, uns urpris ingly, he does training and
web development. But he lives outs ide Raleigh, N orth
C arolina, bec aus e it's warmer.
A ndrew Savikas works in the O 'Reilly Tools G roup,
where he helps the produc tion department turn
manus c ripts into O 'Reilly books . A ndrew is the author of
Word Hacks , als o publis hed by O 'Reilly. H e developed
and maintains the c us tom Word template and V BA
mac ros us ed by all the O 'Reilly authors who don't ins is t
on writing in P O D . E xc ept for the ones who ins is t on
writing in XM L . O r Troff. A ndrew als o works with
FrameM aker, FrameSc ript, I nD es ign, D oc Book XM L ,
P erl, P ython, Ruby, and whatever els e he finds lying
around the offic e. H e has a degree in c ommunic ations
from the U nivers ity of I llinois at U rbana- C hampaign, and
lives in Bos ton with his wife A udrey, who loves to s ee her
name in print.
C hris Sells (http://www.s ells brothers .c om) is an
independent c ons ultant, s peaker, and author s pec ializing
in dis tributed applic ations in .N E T and C O M . H e's
written s everal books and is c urrently working on
Windows Forms for C# and VB.NET Programmers and
Mas tering Vis ual Studio .NET. I n his free time, C hris hos ts
various c onferenc es , direc ts the G enghis s ourc e-
available projec t, plays with Rotor, and makes a pes t of
hims elf in general at M ic ros oft des ign reviews .
A lex Shapiro (http://www.touc hgraph.c om) is the founder
and C T O of Touc hG raph L L C . A lex graduated from
C olumbia's c omputer s c ienc e program in 2 0 0 0 , and
s pent his early c areer at a c ons ulting c ompany. A fter the
s toc k- market bubble burs t, he dec ided to s pend time
developing a network vis ualization produc t he had
c onc eived. T hrough network vis ualization, A lex found
that he c ould c ombine his interes ts in us er interfac e
des ign, graph theory, and s oc iology. A fter s eeing a
bus ines s demand for his tec hnology, A lex founded
Touc hG raph L L C , whic h is s lowly gathering a lis t of
res pec ted c lients .
Kevin Shay (http://www.s taggernation.c om) is a writer
and web programmer who lives in Brooklyn, N ew York.
H is G oogle A P I s c ripts , M ovable Type plug- ins , and
other work c an be found at the s oon- to- launc h
s taggernation.c om.
G ary Stoc k (http://www.googlewhac k.c om/s toc k.htm)
c oined the term "G oogle whac k" while he had intended to
be doing res earc h for U nBlinking
(http://www.unblinking.c om). When G ary writes for
U nBlinking, he might better be foc us ed on his role as
C T O of the news c lipping and briefing s ervic e N exc erpt
(http://www.nexc erpt.c om). G ary works at N exc erpt to
get a break from s tewards hip of the unus ual flora and
fauna on the 1 6 0 ac res of woods and wetland that he
owns , whic h in turn keeps him from s pending time with
his wife (and N exc erpt C E O ) J ulie, whom he married to
offs et his former all- c ons uming c areer as an above- top-
s ec ret c omputer s py, whic h he had entered to avoid
permanently bec oming a jazz arranger and pianis t.
Serious ly.
A aron Swartz (http://www.aarons w.c om) is a teenage
writer, c oder, and hac ker. H e is a c oauthor of the RSS 1 .0
s pec ific ation, a member of the W3 C RD F C ore Working
G roup, and metadata advis er to the C reative C ommons .
H e's als o the guy behind the G oogle Weblog
(http://google.blogs pac e.c om). H e c an be reac hed at
me@ aarons w.c om.
Brett Tabke (http://www.webmas terworld.c om) is the
owner/operator of Webmas terWorld.c om, the leading
news and dis c us s ion s ite for web developers and s earc h
engine marketers . Tabke has been involved in c omputing
s inc e the late 1 9 7 0 s and is one of the I nternet's
foremos t authorities on s earc h engine optimization.
A dam Trac htenberg (http://www.trac htenberg.c om) is
M anager of Tec hnic al E vangelis m at eBay, where he
preac hes the gos pel of the eBay platform to developers
and bus ines s men around the globe. Before eBay, A dam
c ofounded and s erved as V ic e P res ident for
D evelopment at two c ompanies , Student.C om and
T V G rid.C om. A t both firms , he led the front- and middle-
end web s ite des ign and development. A dam began
us ing P H P in 1 9 9 7 and is the author of Upgrading to PHP
5 and c oauthor of PHP Cookbook, both publis hed by
O 'Reilly M edia. H e lives in San Franc is c o and has a B.A .
and M .B.A . from C olumbia U nivers ity
P hillip M . Torrone is a feature c olumnis t for
http://www.engadget.c om and c ontributing editor to
Popular Science. C oauthor of Flas h Enabled: Des ign and
Development for Mobile Devices , P hillip has als o
c ontributed to numerous books and magazines on
hardware hac king, c ell phones , and P D A s . P hillip's
lates t work and more c an be found at
http://www.flas henabled.c om.
M att Webb is an engineer and des igner, s plitting his
working life between R&D with BBC Radio & M us ic
I nterac tive and freelanc e projec ts (primarily in the
s oc ial s oftware world), mos t rec ently c oauthoring Mind
Hacks for O 'Reilly. O nline, he c an be found at
I nterc onnec ted (http://interc onnec ted.org/home) and, in
the real world, in L ondon.
Acknowledgments
We would like to thank all thos e who c ontributed their ideas and
c ode for G oogle hac ks to this book. M any thanks to N els on
M inar and the res t of the G oogle E ngineering Team, N ate Tyler,
and everyone els e at G oogle who provided ideas , s ugges tions ,
and ans wers n ot to mention the G oogle Web A P I its elf. A nd to
A ndy L es ter and J us tin Blanton, our tec hnic al editors along the
way, goes muc h apprec iation for their thorough nitpic king.
Tara
E veryone at O 'Reilly has been great in helping pull this book
together, but I wouldn't have gotten to partic ipate in this book if
it hadn't been for T im A llwine, who firs t helped me with P erl
programs a c ouple of years ago.
M y family, es pec ially my hus band, has been great about
tolerating my dis trac tion as I s at around muttering to mys elf
about variables and s ubroutines .
E ven as this book was being written, I needed help
unders tanding what P erl c ould and c ouldn't do. Kevin H emenway
was an exc ellent teac her, patiently explaining, providing
examples , and when all els e failed, pointing and laughing at my
c ode.
O f c ours e, mos t of this book wouldn't exis t without the releas e of
G oogle's A P I . A big thanks to G oogle for building a playground
for us thous ands of s earc h- engine junkies . A nd jus t as big a
thanks to the many c ontributors who s o generous ly allowed their
applic ations to appear in this book.
Finally, a big, big, he- gets - his - own- paragraph thanks to Rael
D ornfes t, who is a great c oauthor/editor and a lot of fun to work
with.
Rael
Firs t and foremos t, to A s ha, Sam, and M iraa lways my
ins piration, joy, and bes t friends .
M y extended family and friends , both loc al and virtual, who'd
begun to wonder if they needed to s end in a res c ue party.
Brian Sawyer has , over the c ours e of the las t year, been my
produc tion liais on, c oeditor, editor, "man Friday," and friend.
H at's off ;- ) to Brian, and long may he s tet.
I 'd like to thank D ale D ougherty for bringing me in to work on the
H ac ks s eries ; it's been a c irc le of wide c irc umferenc e from
Google Hacks to Google Hacks , Sec ond E dition, and quite the
journey of dis c overy. T he O 'Reilly editors , produc tion, produc t
management, and marketing s taff are c ons ummate
profes s ionals , hac kers , and mens c hes . E xtra s pec ial thanks
goes out to my virtual c ube- mate, N at Torkington, to L aurie
P etryc ki for s howing me the ropes , and T im O 'Reilly for his
unfailing s upport and friends hip.
Tara, it's been fabulous traveling this road with you, and I intend
to make s ure our paths keep on c ros s ing at interes ting
inters ec tions .
Karma points to C lay Shirky and Steven J ohns on for egging me
on to do more with the G oogle A P I than late- night fiddling. A nd,
of c ours e, a s hout- out goes to the blogos phere population and
folks in my G oogle neighborhood for their ins pired prattling on
A P I s and all other things geek- worthy.
Preface
Searc h engines for large c ollec tions of data prec eded the World
Wide Web by dec ades . T here were thos e mas s ive library
c atalogs , hand- typed with pains taking prec is ion on index c ards
and eventually, to varying degrees , automated. T here were the
large data c ollec tions of profes s ional information c ompanies
s uc h as D ialog and L exis N exis . T hen there are the extant
private, expens ive medic al, real es tate, and legal s earc h
s ervic es .
T hos e data c ollec tions were not always eas y to s earc h, but with
a little fines s e and a lot of patienc e, it was always pos s ible to
s earc h them thoroughly. I nformation was grouped ac c ording to
es tablis hed ontologies , the data preformatted ac c ording to
partic ular guidelines .
T hen c ame the Web.
I nformation on the Weba s anyone who has ever looked at half a
dozen web pages knows i s not all formatted the s ame way. N or is
it nec es s arily ac c urate. N or up to date. N or s pellc hec ked.
N onetheles s , s earc h engines c ropped up, trying to make s ens e
of the rapidly inc reas ing index of information online. E ventually,
s pec ial s yntaxes were added for s earc hing c ommon parts of the
average web page (s uc h as title or U RL ). Searc h engines
evolved rapidly, trying to enc ompas s all the nuanc es of the
billions of doc uments online, and they c ontinue to evolve today.
G oogle© threw its hat into the ring in 1 9 9 8 . T he s ec ond
inc arnation of a s earc h engine s ervic e known as Bac kRub, the
name Google was a play on the word googol: a one followed by a
hundred zeros . From the beginning, G oogle was different from the
other major s earc h engines onlineA ltaV is ta, E xc ite, H otBot, and
others .
Was it the tec hnology? P artially. T he relevanc e of G oogle's
s earc h res ults was outs tanding. But more than that, G oogle's
foc us and more human fac e made it s tand out online.
With its friendly pres entation and c ons tantly expanding s et of
options , it's no s urpris e that G oogle c ontinues to draw lots of
fans . T here are weblogs devoted to it. Searc h engine
news letters , s uc h as Res earc hBuzz, s pend a lot of time c overing
G oogle. L egions of devoted fans s pend a lot of time unc overing
undoc umented features , c reating games (s uc h as Google
whacking), and even c oining new words (s uc h as Googling, the
prac tic e of c hec king out a pros pec tive date or hire via G oogle's
s earc h engine.) P eople G oogle pros pec tive employers and blind
dates ; goods and s ervic es ; s c hool reports and movie reviews ;
fac ts and fic tion; fun and profit.
A t the time of this writing, G oogle knows about more than eight
billion web pages , over 8 8 0 million images , and 8 4 5 million
U s enet mes s ages and has jus t announc ed G oogle P rint
(http://print.google.c om), bringing even the printed word to the
Web.
I n A pril 2 0 0 2 , G oogle reac hed out to its fan bas e by offering the
G oogle A P I . T he G oogle A P I gives programmers a way to
ac c es s the G oogle s earc h res ults with automated queries . While
you c an do all the s earc hing, s ifting, and s orting by hand, there's
nothing like getting your c omputer to do it for you.
G oogle has c hanged the way people and c omputers alike
approac h the Web.
Why Google Hacks?
H ac ks are generally c ons idered to be "quic k- and- dirty"
s olutions to programming problems or interes ting tec hniques for
getting a tas k done. But what does this kind of hac king have to
do with G oogle?
C ons idering the s ize of the G oogle index, there are many times
when you might want to do a partic ular kind of s earc h but you get
too many res ults for the s earc h to be us eful. O r you may want to
do a s earc h that the c urrent G oogle interfac e does not s upport.
T he idea of Google Hacks is not to give you s ome exhaus tive
manual of how every c ommand in the G oogle s yntax works
(although we do give this more than a fair s hake), but rather to
s how you s ome tric ks for making the bes t us e of a s earc h, s how
off jus t what's pos s ible when you automate your queries with a
little programming know- how, and s hine a light into s ome of the
overlooked c orners of G oogle's offerings . I n other words , hacks .
How This Book Is Organized
T he c ombination of G oogle's myriad s ervic es and over four
billion pages of c ons tantly s hifting data c an do s trange things to
your imagination and give you lots of new pers pec tives on how
bes t to s earc h. T his book goes beyond the ins truc tion page to
the idea of hacks : tips , tric ks , and tec hniques you c an us e to
make your G oogle s earc hing experienc e more fruitful, more fun,
or (in a c ouple of c as es ) jus t more weird.
T his book is divided into s everal c hapters :
C hapter 1 , Web
T his c hapter des c ribes the fundamentals of how
G oogle's s earc h works . You'll find tips and tric ks for
G oogle's s pec ial s yntax (think "s pec ial s auc e");
s pec ialty s earc hes like the phonebook, c alc ulator,
pac kage and s toc k trac king; the G oogle c ac he, related
links , and more. Beyond a mere lis t of "this s yntax
means that," we'll take a look at how to eke every las t
bit of s earc hing power out of eac h s yntaxa nd how to mix
and matc h for s ome truly mons trous s earc hes .
C hapter 2 , Advanced Web
Kic k your newfound s earc h expertis e into high gear,
automating your trawling, c rawling, and rec ombination by
hac king G oogle programmatic ally. You'll meander farther,
dig deeper, and c ome up with res ults that you never
would have found by letting your fingers do the walking
and eyeballs the s c anning.
C hapter 3 , I mages
Take a break from all that text c rawling and immers e
yours elf in s ome of the millions of photographs ,
drawings , ic ons , and diagrams that G oogle I mages has
turned up in the proc es s of c rawling and indexing the
Web.
C hapter 4 , News and Groups
C atc h up on the day's news and events as brought to
you by G oogle N ews . G et involved in a group dis c us s ion
on the I nternet or s tart a G oogle G roup of your own.
C hapter 5 , Add-Ons
G o beyond the web brows er, integrating G oogle into your
toolbar, des ktop, and word proc es s or. Take advantage of
s ome of the s ervic es built on G oogle. Searc h on the go
via email or ins tant mes s enger, from your phone or P D A .
C hapter 6 , Gmail
G oogle's G mail© is n't your average, ordinary web mail
s ervic e. From its s lic k, interac tive, real- applic ation- like
J avaSc ript- powered web interfac e to its gigabyte of
s torage s pac e, there's more than enough to make you
s witc h. A nd then there are the alternate us es you jus t
won't believe until you try.
C hapter 7 , Ads
G oogle's A dSens e© brings s ubtle, mos tly text ads to
your web s ite or weblogn o matter how large or how s mall.
A nd A dWords © breaks down the barrier to entry for
getting your bus ines s or projec t found on the N et.
C hapter 8 , Webmas tering
I f you're a web wrangler, you s ee G oogle from two s ides :
from the s earc her s ide and from the s ide of s omeone who
wants to get the bes t s earc h ranking for a web s ite. I n
this c hapter, you'll learn about G oogle's (in)famous
P ageRank©, how to c lean up for a G oogle vis it, and how
to make s ure that your pages aren't indexed by G oogle if
you don't want them to be.
C hapter 9 , Programming Google
T his c hapter introduc es you to the wonders of the
G oogle A pplic ation P rogramming I nterfac e (A P I )
underlying many of the hac ks in this book. I f you've ever
been tempted to try your hand at programming, this is as
good a plac e as any to find ins piration.
How to Use This Book
You c an read this book from c over to c over if you like, but for the
mos t part, eac h hac k s tands on its own. So feel free to brows e,
flipping around whatever s ec tions interes t you mos t. I f you're a
P erl newbie, you might want to try s ome of the eas ier hac ks and
then tac kle the more extens ive ones as you get more c onfident.
How to Run the Hacks
T he programmatic hac ks in this book run either on the c ommand
line (that's Terminal for M ac O S X folk, D O S c ommand window
for Windows us ers ) or as C G I s c ripts d ynamic pages living on
your web s ite, ac c es s ed through your web brows er.
Command-Line Scripts
Running a hac k on the c ommand line invariably involves the
following s teps :
1. Type the program into a garden- variety text editor:
N otepad on Windows , TextE dit on M ac O S X, vi or Emacs
on U nix/L inux, or anything els e of the s ort. Save the file
as direc tedu s ually as s criptname.pl (the pl bit s tands for
P erl, the predominant programming language us ed in
Google Hacks ).
A lternatively, you c an download the c ode for all of the
hac ks online at
http://www.oreilly.c om/c atalog/googlehks 2 , a ZI P
arc hive filled with individual s c ripts already s aved as
text files .
2. G et to the c ommand line on your c omputer or remote
s erver. I n M ac O S X, launc h the Terminal (A pplic ations
U tilities Terminal). I n Windows , c lic k the Start
button, s elec t Run..., type command, and hit the
E nter/Return key on your keyboard. I n U nix ... well, we'll
jus t as s ume you know how to get to the c ommand line.
3. N avigate to where you s aved the s c ript at hand. T his
varies from operating s ys tem to operating s ys tem, but
us ually involves s omething like cd ~/Desktop (that's your
D es ktop on the M ac ).
4. I nvoke the s c ript by running the programming
language's interpreter (e.g., P erl) and feeding it the
s c ript (e.g., s criptname.pl) like s o:
$ perl scriptname.pl
5. M os t often, you'll als o need to pas s along s ome
parameters y our s earc h query, the number of res ults
you'd like, and s o forth. Simply drop them in after the
s c ript name, enc los ing them in quotes if they're more
than one word or if they inc lude an odd c harac ter or
three:
$ perl scriptname.pl '"much ado about nothing" script' 10
6. T he res ults of your s c ript are almos t always s ent
s traight bac k to the c ommand- line window in whic h
you're working, like s o:
$ perl scriptname.pl '"much ado about nothing" script' 10
1. "Amazon.com: Books: Much Ado About Nothing: Screenplay
[http://www.amazon.com/exec/obidos/tg/detail/-/0393311112?
2. "Much Ado About Nothing Script"
[http://www.signal42.com/much_ado_about_nothing_script.asp
...
T he elllps is (...) bit s ignifies that
we've c ut off the output for
brevity's s ake.
7. To s top output s c rolling off your s c reen fas ter than you
c an read it, on mos t s ys tems you c an "pipe" (read:
redirec t) the output to a little program c alled more:
$ perl scriptname.pl | more
H it the E nter/Return key on your keyboard to s c roll
through line by line, the s pac e bar to leap through page
by page.
You'll als o s ometimes want to direc t output to a file for
s afekeeping, importing into your s preads heet
applic ation, or dis playing on your web s ite. T his is as
eas y; refer to the c ode s hown next.
$ perl scriptname.pl > output_filename.txt
A nd to pour s ome input into your s c ript from a file,
s imply do the oppos ite:
$ perl scriptname.pl < input_filename.txt
D on't worry if you c an't remember all of this ; eac h hac k has a
"Running the H ac k" s ec tion, and s ome even have a "T he
Res ults " s ec tion that s hows you jus t how it's done.
CGI Scripts
C G I s c ripts p rograms that run on your web s ite and produc e
pages dynamic allya re a little more c omplic ated if you're not
us ed to them. While fundamentally they're the s ame s ort of
s c ripts as thos e run on the c ommand line, they are more
troubles ome bec aus e s etups vary s o widely. You may be running
your own s erver, your web s ite may be hos ted on an I nternet
s ervic e provider's (I SP ) s erver, your c ontent may live on a
c orporate intranet s ervero r anything in between.
Sinc e going through every pos s ibility is beyond the s c ope of this
(or any) book, you s hould c hec k your I SP 's knowledge bas e or
c all their tec hnic al s upport department, or as k your loc al s ys tem
adminis trator for help.
G enerally, though, the methodology is the s ame:
1. Type the program in to a garden- variety text editor:
N otepad on Windows , TextE dit on M ac O S X, vi or Emacs
on U nix/L inux, or anything els e of the s ort. Save the file
as direc tedu s ually as s criptname.cgi (the cgi bit reveals
that you're dealing with a C G I t hat's c ommon gateway
interfac es c ript).
2. A lternatively, you c an download the c ode for all of the
hac ks online at
http://www.oreilly.c om/c atalog/googlehks 2 , a ZI P
arc hive filled with individual s c ripts already s aved as
text files .
3. M ove the s c ript over to wherever your web s ite lives .
You s hould have s ome direc tory on a s erver s omewhere
in whic h all of your web pages (all thos e .html files ) and
images (ending in .j pg, .gif, etc .) live. Within this
direc tory, you'll probably s ee s omething c alled a cgi-bin
direc tory: this is where C G I s c ripts mus t us ually live in
order to be run rather than jus t dis played in your web
brows er when you vis it them.
4. You us ually need to bles s C G I s c ripts as exec utablet o
be run rather than dis played. J us t how you do this
depends on the operating s ys tem of your s erver. I f
you're on a U nix/L inux or M ac O S X s ys tem, this us ually
entails typing the following on the c ommand line:
$ chmod 755
scriptname.cgi
5. N ow you s hould be able to point your web brows er at the
s c ript and have it run as expec ted, behaving in a manner
s imilar to that des c ribed in the "Running the H ac k"
s ec tion of the hac k at hand.
6. J us t what U RL you us e onc e again varies wildly. I t
s hould, however, look s omething like this :
http://www.your_domain.com/cgi-bin/s criptname.cgi,
where your_domain.com is your web s ite domain, cgi-bin
refers to the direc tory in whic h your C G I s c ripts live,
and s criptname.cgi is the s c ript its elf.
7. I f you don't have a domain and are hos ted at an I SP, the
U RL is more likely to look like this :
http://www.your_is p.com/~your_us ername/cgi-
bin/s criptname.cgi, where your_isp.com is your I SP 's
domain, ~your_username is your us ername at the I SP, cgi-
bin refers to the direc tory in whic h your C G I s c ripts live,
and scriptname.cgi is the s c ript its elf.
I f you c ome up with s omething c alled an "I nternal Server E rror"
or s ee the error c ode 5 0 0 , s omething's gone wrong s omewhere
in the proc es s . A t this point you c an take a c rac k at debugging
(read: s haking the bugs out) yours elf or as k your I SP or s ys tem
adminis trator for help. D ebugginge s pec ially C G I debuggingc an
be a little more than the average newbie c an bear, but there is
help in the form of a famous Frequently A s ked Q ues tion (FA Q ):
"T he I diot's G uide to Solving P erl C G I P roblems ." G oogle for it
and s tep through as direc ted.
Using the Google API
Be s ure to c ons ult C hapter 9 for an introduc tion to the G oogle
A P I , how to s ign up for a developer's keyy ou'll need one for
many of the hac ks in this booka nd the bas ic s of programming
G oogle in a s elec tion of languages to get you going.
Learning to Code
Fanc y trying your hand at a s pot of programming? O 'Reilly's
bes t- s elling Learning Perl (http://www.oreilly.c om/c atalog/lperl3 )
by Randal L . Sc hwartz and Tom P hoenix provides a good s tart.
A pply what you learn to unders tanding and us ing the hac ks in
this book, perhaps even taking on the "H ac king the H ac k"
s ec tions to tweak and fiddle with the s c ripts . T his is a us eful way
to get a little programming under your belt if you're a s earc hing
nut, s inc e it's always a little eas ier to learn how to program when
you have a tas k to ac c omplis h and exis ting c ode to leaf through.
Where to Go for More
T here's s o muc h to G oogle that it's eas y to mis s minor tweaks
and major new offerings alike. P ay a regular vis it to the G oogle
"M ore, more, more" page (http://www.google.c om/options ). Stay
on top of all things G oogle by reading or s ubs c ribing to the
G oogle blogs , unoffic ial (http://google.blogs pac e.c om) and
offic ial (http://www.google.c om/googleblog).
G a- ga over G oogle? P ic k up a G oogle- branded tc hotc hkeg reen
lava lamp, double latte mug, t- s hirt, bac kbac k, or booka t the
offic ial G oogle Store (http://www.googles tore.c om).
Conventions
T he following is a lis t of the typographic al c onventions us ed in
this book:
I talic
U s ed to indic ate new terms , U RL s , filenames , file
extens ions , direc tories , and program names , and to
highlight c omments in examples . For example, a path in
the files ys tem will appear as /Developer/Applications .
Constant width
U s ed to s how c ode examples , c ommands and options ,
c ontents of files , or output from c ommands .
Constant width bold
U s ed for emphas is and us er input in c ode.
Constant width italic
U s ed in examples and tables to s how text that s hould be
replac ed with us er- s upplied values .
You s hould pay s pec ial attention to notes s et apart from the text
with the following ic ons :
T his is a tip, s ugges tion, or general note.
I t c ontains us eful s upplementary
information about the topic at hand.
T his is a warning or note of c aution, often
indic ating that your money or your privac y
might be at ris k.
T he thermometer ic ons , found next to eac h hac k, indic ate the
relative c omplexity of the hac k:
beginner moderate expert
Using Code Examples
T his book is here to help you get your job done. I n general, you
may us e the c ode in this book in your programs and
doc umentation. You do not need to c ontac t us for permis s ion
unles s you're reproduc ing a s ignific ant portion of the c ode. For
example, writing a program that us es s everal c hunks of c ode
from this book does not require permis s ion. Selling or
dis tributing a C D - RO M of examples from O 'Reilly books does
require permis s ion. A ns wering a ques tion by c iting this book and
quoting example c ode does not require permis s ion.
I nc orporating a s ignific ant amount of example c ode from this
book into your produc t's doc umentation does require permis s ion.
We apprec iate, but do not require, attribution. A n attribution
us ually inc ludes the title, author, publis her, and I SBN . For
example: "Google Hacks , Sec ond E dition, by Tara C alis hain and
Rael D ornfes t. C opyright 2 0 0 4 O 'Reilly M edia, I nc ., 0 - 5 9 6 -
0 0 8 5 7 - 0 ."
I f you feel your us e of c ode examples falls outs ide fair us e or the
permis s ion given above, feel free to c ontac t us at
permis s ions @ oreilly.c om.
Safari Enabled
When you s ee a Safari® enabled ic on on the c over of your
favorite tec hnology book that means the book is avaialbe online
through the O 'Reilly N etwork Safari Books helf.
Safari offers a s olution that's better than e- Books . I t's a virtual
library that let's you eas ily s earc h thous ands of top tec h books ,
c ut and pas te c ode s amples , download c hapters , and find quic k
ans wers when you nee the mos t ac c urate, c urrent information.
Try it free at http://s afari.oreilly.c om.
How to Contact Us
We have tes ted and verified the information in this book to the
bes t of our ability, but you may find that features have c hanged
(or even that we have made mis takes ! ). A s a reader of this book,
you c an help us to improve future editions by s ending us your
feedbac k. P leas e let us know about any errors , inac c urac ies ,
bugs , mis leading or c onfus ing s tatements , and typos that you
find anywhere in this book.
P leas e als o let us know what we c an do to make this book more
us eful to you. We take your c omments s erious ly and will try to
inc orporate reas onable s ugges tions into future editions . You c an
write to us at:
O 'Reilly M edia, I nc .
1 0 0 5 G ravens tein H wy N .
Sebas topol, C A 9 5 4 7 2
(8 0 0 ) 9 9 8 - 9 9 3 8 (in the U .S. or C anada)
(7 0 7 ) 8 2 9 - 0 5 1 5 (international/loc al)
(7 0 7 ) 8 2 9 - 0 1 0 4 (fax)
To as k tec hnic al ques tions or to c omment on the book, s end
email to:
bookques tions @ oreilly.c om
T he web s ite for Google Hacks , Sec ond E dition, lis ts examples ,
errata, and plans for future editions . You c an find this page at:
http://www.oreilly.c om/c atalog/googlehks 2
For more information about this book and others , s ee the
O 'Reilly web s ite:
http://www.oreilly.c om
Got a Hack?
To explore H ac ks books online or to c ontribute a hac k for future
titles , vis it:
http://hac ks .oreilly.c om
Chapter 1. Web
H ac ks 1 - 2 0
Sec tion 1 .2 . G oogle Web Searc h Bas ic s
Sec tion 1 .3 . Full- Word Wildc ards
Sec tion 1 .4 . T he 1 0 - Word L imit
Sec tion 1 .5 . Spec ial Syntax
Sec tion 1 .6 . M ixing Syntax
Sec tion 1 .7 . A dvanc ed Searc h
Sec tion 1 .8 . Q uic k L inks
Sec tion 1 .9 . L anguage Tools
Sec tion 1 .1 0 . A natomy of a Searc h Res ult
Sec tion 1 .1 1 . Setting P referenc es
Sec tion 1 .1 2 . U nders tanding G oogle U RL s
H ac k 1 . Brows e the G oogle D irec tory
H ac k 2 . G lean a Snaps hot of G oogle in T ime
H ac k 3 . G raph G oogle Res ults over T ime
H ac k 4 . V is ualize G oogle Res ults
H ac k 5 . C hec k Your Spelling
H ac k 6 . G oogle P honebook: L et G oogle's Fingers D o
the Walking
H ac k 7 . T hink G lobal, G oogle L oc al
H ac k 8 . Trac k Stoc ks
H ac k 9 . C ons ult the D ic tionary
H ac k 1 0 . L ook U p D efinitions
H ac k 1 1 . Searc h A rtic le A rc hives
H ac k 1 2 . Find D irec tories of I nformation
H ac k 1 3 . Seek O ut Weblog C ommentary
H ac k 1 4 . C over Your Bas es
H ac k 1 5 . Repetition M atters
H ac k 1 6 . Searc h a P artic ular D ate Range
H ac k 1 7 . C alc ulate G oogle C enturys hare
H ac k 1 8 . H ac k Your O wn G oogle Searc h Form
H ac k 1 9 . G o Beyond G oogle's A dvanc ed Searc h
H ac k 2 0 . U s e G oogle Tools for Trans lators
Hacks 1-20
G oogle's front page is dec eptively s imple: a s earc h form and a
c ouple of buttons . Yet that bas ic interfac es o alluring in its
s implic ityb elies the power of the G oogle engine underneath and
the wealth of information at its dis pos al. I f you us e G oogle's
s earc h s yntax to its fulles t, the Web is your oys ter.
Searc hing in G oogle does n't have to be a c as e of jus t entering
what you're looking for in the s earc h box and hoping for the bes t.
G oogle offers you many ways v ia s pec ial s yntax and s earc h
options t o refine your s earc h c riteria and help G oogle better
unders tand what you're looking for. We'll dig into G oogle's
powerful, all- but- undoc umented s pec ial s yntax and s earc h
options , and s how how to us e them to their fulles t. We'll c over
the bas ic s of G oogle s earc hing, wildc ards , word limits , s yntax for
s pec ial c as es , mixing s yntax elements , advanc ed s earc h
tec hniques , and us ing s pec ialized voc abularies , inc luding s lang
and jargon.
1.2. Google Web Search Basics
Whenever you s earc h for more than one keyword at a time, a
s earc h engine has a default s trategy for handling and c ombining
thos e keywords . C an thos e words appear individually anywhere
in a page, or do they have to be right next to eac h other? Will the
engine s earc h for both keywords or for either keyword?
1.2.1. Phrase Searches
G oogle defaults to s earc hing for oc c urrenc es of your s pec ified
keywords anywhere in the page, whether s ide- by- s ide or
s c attered throughout. To return res ults of pages c ontaining
s pec ific ally ordered words , enc los e them in quotes , turning your
keyword s earc h into a phras e s earch, to us e G oogle's
terminology.
O n entering a s earc h for the keywords :
to be or not to be
G oogle will find matc hes where the keywords appear anywhere
on the page. I f you want G oogle to find you matc hes where the
keywords appear together as a phras e, s urround them with
quotes , like this :
"to be or not to be"
G oogle will return matc hes only where thos e words appear
together (not to mention explic itly inc luding s top words s uc h as
"to" and "or"; s ee the s ec tion "E xplic it I nc lus ion" a little later).
P hras e s earc hes are als o us eful when you want to find a phras e
but aren't quite s ure of the exac t wording. T his is ac c omplis hed
in c ombination with wildc ards , explained later in the c hapter in
"Full- Word Wildc ards ."
1.2.2. Basic Boolean
Whether an engine s earc hes for all keywords or any of them
depends on what is c alled its Boolean default. Searc h engines c an
default to Boolean AND (s earc hing for all keywords ) or Boolean OR
(s earc hing for any keywords ). O f c ours e, even if a s earc h engine
defaults to s earc hing for all keywords , you c an us ually give it a
s pec ial c ommand to ins truc t it to s earc h for any keyword.
L ac king s pec ific ins truc tions , the engine falls bac k on its default
s etting.
G oogle's Boolean default is AND, whic h means that, if you enter
query words without modifiers , G oogle will s earc h for all of your
query words . For example, if you s earc h for:
snowblower Honda "Green Bay"
G oogle will s earc h for all the words . I f you prefer to s pec ify that
any one word or phras e is ac c eptable, put an OR between eac h:
snowblower OR snowmobile OR "Green Bay"
M ake s ure you c apitalize the OR; a
lowerc as e or won't work c orrec tly.
I f you partic ularly want one term along with one of two or more
other terms , group them with parenthes es , like s o:
snowblower (snowmobile OR "Green Bay")
T his query s earc hes for the word "s nowmobile" or phras e "G reen
Bay" along with the word "s nowblower." A s tand- in for OR,
borrowed from the c omputer programming realm, is the | (pipe)
c harac ter, as in:
snowblower (snowmobile | "Green Bay")
1.2.3. Negation
I f you want to s pec ify that a query item mus t not appear in your
res ults , prepend a - (minus s ign or das h):
snowblower snowmobile -"Green Bay"
T his will s earc h for pages that c ontain both the words
"s nowblower" and "s nowmobile," but not the phras e "G reen Bay."
N ote that the - s ymbol mus t appear direc tly before the word or
phras e that you don't want. I f there's s pac e between, as in the
following query, it won't work as expec ted:
snowblower snowmobile - "Green Bay"
D o be s ure, however, that there's a s pac e before the - s ymbol.
1.2.4. Explicit Inclusion
O n the whole, G oogle will s earc h for all the keywords and
phras es that you s pec ify (with the exc eption of thos e you've
s pec ific ally negated with - , of c ours e). H owever, there are c ertain
words that G oogle will ignore bec aus e they are c ons idered too
c ommon to be of any us e in the s earc h. T hes e words " I ," "a,"
"the," and "of," to name a fewa re c alled s top words .
You c an forc e G oogle to take a s top word into ac c ount by
prepending a + (plus ) c harac ter, as in:
+the king
Stop words that appear ins ide of phras e s earc hes are not
ignored. Searc hing for:
"the move" glam
will res ult in a more ac c urate lis t of matc hes than:
the move glam
s imply bec aus e G oogle takes the word "the" into ac c ount in the
firs t example but ignores it in the s ec ond.
1.2.5. Synonyms
E very s o often you get the feeling that you're mis s ing out on
s ome us eful res ults bec aus e the keyword or keywords you've
c hos en aren't the only way to expres s what you're looking for.
T he G oogle s ynonym operator, the ~ (tilde) c harac ter, prepended
to any number of keywords in your query, as ks G oogle to inc lude
not only exac t matc hes , but als o what it thinks are s ynonyms for
eac h of the keywords . Searc hing for:
~ape
turns up res ults for monkey, gorilla, c himpanzee, and others
(both s ingular and plural forms ) of the ape or related family as if
you'd s earc hed for:
monkey gorilla chimpanzee
and s ome you'd never have thought to inc lude in your query.
G oogle figures out s ynonyms algorithmic ally, s o you may well be
s urpris ed to find res ults around words that your garden- variety
thes aurus would not have s ugges ted. (Synonyms are bolded
along with exac t keyword matc hes on the res ults page, s o
they're eas y to s pot.)
1.2.6. Number Range
O ne of the more diffic ult things to c onvey in an I nternet s earc h
query is a rangeo f dates , c urrenc y, s ize, weight, height, or any
two arbitrary values .
T he number range operator, .. (two periods ), looks for res ults
falling ins ide your s pec ified numeric range.
L ooking for that perfec t pair of P rada pumps , s ize 5 or 6 ? Try
this for s ize:
prada pumps size 5..6
P erhaps you're looking to s pend $ 8 0 0 to $ 1 ,0 0 0 on a nic e
digital SL R c amera; G oogle for:
slr digital camera 3..5 megapixel $800..1000
T he one thing to remember is always to provide s ome c lue as to
the meaning of the range, e.g., $, size, megapixel, kg, and s o forth.
You c an als o us e the number range s yntax with jus t one number,
making it the minimum or maximum of your query. D o you want
to find s ome land in M ontana that's at leas t 5 0 0 ac res ? N o
problem:
acres Montana land 500..
O n the other hand, you may want to make s ure that rainc oat you
buy for your terrier does n't c os t more than $ 3 0 . T hat's pos s ible
too:
raincoat dog ..$30
G oogle normally does not rec ognize
s pec ial c harac ters like $ in the proc es s of
s earc hing. But bec aus e the $ s ign was
nec es s ary for the number feature, you c an
us e it in all s orts of s earc hes . Try the
s earc h " yard sale" bargains 10 and then
" yard sale" bargains $10. N otic e how the
s ec ond s earc h gives you far fewer res ults ?
T hat's bec aus e G oogle is matc hing $10
exac tly.
1.2.7. Simple Searching and Feeling Lucky
T he I 'm Feeling L uc ky© button is a thing of beauty. Rather than
giving you a lis t of s earc h res ults from whic h to c hoos e, you're
whis ked away to what G oogle believes is the mos t relevant page
given your s earc h (i.e., the firs t res ult in the lis t). E ntering
washington post and c lic king the I 'm Feeling L uc ky button takes
you direc tly to http://www.was hingtonpos t.c om. Trying president
will land you at http://www.whitehous e.gov.
1.2.8. Case Sensitivity
Some s earc h engines are c as e s ens itive; that is , they s earc h for
queries bas ed on how the queries are c apitalized. A s earc h for
" GEORGE WASHINGTON" on s uc h a s earc h engine would not find
"G eorge Was hington," "george was hington," or any other c as e
c ombination.
G oogle is c as e ins ens itive. I f you s earc h for Three, three, ThrEE,
even ThrEE, you get the s ame res ults .
1.3. Full-Word Wildcards
Some s earc h engines s upport a tec hnique c alled s temming.
Stemming is adding a wildc ard c harac teru s ually * (as teris k) but
s ometimes ? (ques tion mark)t o part of your query, reques ting the
s earc h engine return variants of that query us ing the wildc ard as
a plac eholder for the res t of the word at hand. For example, moon*
would find moons , moonlight, moons hot, etc .
G oogle does n't s upport explic it s temming. I t didn't us ed to
s upport s temming at all, but now it implic itly s tems for you. So,
dietary will yield res ults for diet, diets , and other variations on
the theme.
G oogle does offer a full- word wildc ard. While you c an't have a
wildc ard s tand in for part of a word, you c an ins ert a wildc ard
(G oogle's wildc ard c harac ter is *) into a phras e and have the
wildc ard ac t as a s ubs titute for one full word. Searc hing for
" three * mice", therefore, finds three blind mic e, three blue mic e,
three green mic e, etc .
What good is the full- word wildc ard? I t's c ertainly not as us eful
as s temming, but then again, it's not as c onfus ing to the
beginner. O ne * is a s tand- in for one word; two * s ignifies two
words , and s o on. T he full- word wildc ard c omes in handy in the
following s ituations :
A voiding the 1 0 - word limit (s ee "T he 1 0 - Word L imit"
next) on G oogle queries . You'll mos t frequently run into
thes e examples when you're trying to find s ong lyric s or
a quote. P lugging the phras e Fourscore and seven years
ago, our fathers brought forth on this continent into
G oogle will s earc h only as far as the word "on";
everything thereafter is s ummarily ignored by G oogle.
C hec king the frequenc y of c ertain phras es and
derivatives of phras es , s uc h as : intitle:"methinks the *
doth protest too much" and intitle: " the * of Seville"
(intitle: is des c ribed later in "Spec ial Syntax").
Filling in the blanks on a fitful memory. P erhaps you
remember only a s hort s tring of s ong lyric s ; s earc h
us ing only what you remember rather than randomly
rec ons truc ted full lines .
L et's take as an example the dis c o anthem "G ood
T imes " by C hic . C ons ider the following line: "You s illy
fool, you c an't c hange your fate."
P erhaps you've heard that lyric , but you c an't remember
if the word "fool" is c orrec t or if it's s omething els e. I f
you're wrong (if the c orrec t line is , for example, "You
s illy c hild, you c an't c hange your fate"), your s earc h will
find no res ults and you'll c ome away with the s ad
c onc lus ion that no one on the I nternet has bothered to
pos t lyric s to C hic s ongs .
T he s olution is to run the query with a wildc ard in plac e
of the unknown word, like s o:
"You silly *, you can't change your fate"
You c an us e this tec hnique for quotes , s ong lyric s ,
poetry, and more. You s hould be mindful, however, to
inc lude enough of the quote to find unique res ults .
Searc hing for " you * fool" will glean you far too many
irrelevant hits .
1.4. The 10-Word Limit
U nles s you're fond of long, detailed queries , you might never
have notic ed that G oogle has a hard limit of 1 0 words t hat's
keywords and s pec ial s yntaxes c ombinedi gnoring anything
beyond. While this has no real effec t on c as ual G oogle us ers ,
s earc h hounds quic kly find that this limit rather c ramps their
s tyle.
1.4.1. Favor Obscurity
By limiting your query to the more obs c ure of your keywords or
phras e fragments , you'll hone res ults without s quandering
prec ious query words . L et's s ay you're interes ted in a phras e
from H amlet: "T he lady doth protes t too muc h, methinks ." A t
firs t blus h, you might s imply pas te the entire phras e into the
query field. But that's 7 of your 1 0 allotted words right there,
leaving no room for additional query words or s earc h s yntax.
T he firs t thing to do is ditc h the firs t c ouple of words ; "T he lady"
is jus t too c ommon a phras e. T his leaves the 5 words "doth
protes t too muc h, methinks ." N either "methinks " nor "doth" are
words that you might hear every day, providing a nic e
Shakes pearean anc hor for the phras e. T hat s aid, one or the
other s hould s uffic e, leaving the query at an even 4 words with
room to grow:
"protest too much methinks"
or:
"doth protest too much"
E ither of thes e will provide, within the firs t five res ults , origins of
the phras e and pointers to more information.
U nfortunately, this tec hnique won't do you muc h good in the
c as e of "D o as I s ay, not as I do," whic h does n't provide muc h in
the way of obs c urity. A ttempt c larific ation by adding s omething
like quote origin English usage and you're s tepping beyond the
1 0 - word limit. O ne s olution is des c ribed next.
1.4.2. Playing the Wildcard
H elp c omes in the form of G oogle's full- word wildc ard, des c ribed
earlier. I t turns out that G oogle does n't c ount wildc ards toward
the limit.
So, when you have more than 1 0 words , s ubs titute a wildc ard for
c ommon words , like s o:
"do as * say not as * do" quote origin English usage
P res to! G oogle runs the s earc h without c omplaint, and you're in
for s ome well- honed res ults .
C ommon words s uc h as "I ," "a," "the," and
"of" ac tually do no good in the firs t plac e.
C alled s top words they are ignored by
G oogle entirely unles s us ed in within a
phras e. To forc e G oogle to take a s top
word into ac c ount, prepend it with a +
(plus ) c harac ter, as in: +the.
1.5. Special Syntax
I n addition to the bas ic AND, OR, and phras e s earc hes , G oogle
offers s ome rather extens ive s pec ial s yntax for narrowing your
s earc hes .
A s a full- text s earc h engine, G oogle indexes entire web pages
ins tead of jus t titles and des c riptions . A dditional c ommands ,
c alled s pecial s yntax or advanced operators , let G oogle us ers
s earc h s pec ific parts of web pages for s pec ific types of
information. T his c omes in handy when you're dealing with more
than eight billion web pages and need every opportunity to
narrow your s earc h res ults . Spec ifying that your query words
mus t appear only in the title or U RL of a returned web page is a
great way to have your res ults get very s pec ific without making
your keywords thems elves too s pec ific . Following are
des c riptions of the s pec ial s yntax elements , ordered by c ommon
us age and func tion.
Some of thes e s yntax elements work well
in c ombination. O thers fare not quite as
well. Still others do not work at all. For
detailed dis c us s ion on what does and does
not mix, s ee "M ixing Syntax," below.
intitle:
intitle: res tric ts your s earc h to the titles of web pages .
T he variation allintitle: finds pages wherein all the
words s pec ified appear in the title of the web page. U s ing
allintitle: is bas ic ally the s ame as us ing the intitle:
before eac h keyword.
intitle:"george bush"
allintitle:"money supply" economics
You may wis h to avoid the allintitle: variation, bec aus e
it does n't mix well with s ome of the other s yntax
elements .
intext:
intext: s earc hes only body text (i.e., ignores link text,
U RL s , and titles ). While its us es are limited, it's perfec t
for finding query words that might be too c ommon in
U RL s or link titles .
intext:"yahoo.com"
intext:html
T here's an allintext: variation, but again, this does n't
play well with others .
inanchor:
inanchor: s earc hes for text in a page's link anc hors . A
link anc hor is the des c riptive text of a link. For example,
the link anc hor in the H T M L c ode <a
href="http://www.oreilly.com">O'Reilly Media</a> is
"O 'Reilly M edia."
inanchor:"tom peters"
A s with other in*: s yntax elements , there's an
allinanchor: variation, whic h works in a s imilar way (i.e.,
all the keywords s pec ified mus t appear in a page's link
anc hors ).
site:
site: allows you to narrow your s earc h by either a s ite or
a top- level domain. T he A ltaV is ta s earc h engine, by
c ontras t, has two s yntax elements for this func tion
(host: and domain:), but G oogle has only the one.
site:loc.gov
site:thomas.loc.gov
site:edu
site:nc.us
Be aware that site: is no good for trying to s earc h for a
page that exis ts beneath the main or default s ite (i.e., in
a s ubdirec tory s uc h as /~s am/album/). For example, if
you're looking for s omething below the main G eoC ities
s ite, you c an't us e site: to find all the pages in
http://www.geoc ities .c om/H eartland/M eadows /6 4 8 5 /;
G oogle returns no res ults . U s e inurl: ins tead.
inurl:
inurl: res tric ts your s earc h to the U RL s of web pages .
T his s yntax tends to work well for finding s earc h and
help pages , bec aus e they tend to be rather regular in
c ompos ition. A n allinurl: variation finds all the words
lis ted in a U RL but does n't mix well with s ome other
s pec ial s yntax.
inurl:help
allinurl:search help
You'll s ee that us ing the inurl: query ins tead of the site:
query has one immediate advantage: you c an us e it to
s earc h s ubdirec tories .
While the http:// prefix in a U RL is
ignored by G oogle when us ed with
site:, s earc h res ults c ome up s hort
when inc luding it in an inurl: query.
Be s ure to remove prefixes in any
inurl: query for the bes t (read: any)
res ults .
You c an als o us e inurl: in c ombination with the site:
s yntax to draw out information on s ubdomains . For
example, how many s ubdomains does oreilly.com really
have? A quic k query will help you figure that out:
site:oreilly.com -inurl:www.oreilly.com
T his query as ks G oogle to lis t all pages from the
oreilly.com domain, but leave out thos e pages whic h are
from the c ommon s ubdomain www, s inc e you already
know about that one.
link:
link: returns a lis t of pages linking to the s pec ified U RL .
E nter link:www.google.com and you'll get a lis t of pages
that link to the G oogle home page, www.google.c om (not
anywhere in the google.c om domain). D on't worry about
inc luding the http:// bit; you don't need it and, indeed,
G oogle appears to ignore it even if you do put it in. link:
works jus t as well with "deep"
U RL s h ttp://www.raelity.org/apps /blos xom/, for
ins tanc ea s with top- level U RL s s uc h as raelity.org.
cache:
cache: finds a c opy of the page that G oogle indexed even
if that page is no longer available at its original U RL or
has s inc e c hanged its c ontent c ompletely.
cache:www.yahoo.com
I f G oogle returns a res ult that appears to have little to
do with your query, you're almos t s ure to find what you're
looking for in the lates t c ac hed vers ion of the page at
G oogle.
T he G oogle c ac he is partic ularly us eful for retrieving a
previous vers ion of a page that c hanges often.
daterange:
daterange: limits your s earc h to a partic ular date or
range of dates on whic h a page was indexed. I t's
important to note that a daterange: s earc h has nothing to
do with when a page was c reated, but when it was
indexed by G oogle. So a page c reated on February 2 but
not indexed by G oogle until A pril 1 1 would turn up in a
daterange: s earc h for A pril 1 1 .
"Geri Halliwell" "Spice Girls" daterange:2450958-2450968
For an in- depth treatment of finding c ontent either by the
date it was c reated or when it was firs t notic ed by
G oogle, s ee [Hack #16] .
filetype:
filetype: s earc hes the s uffixes or filename extens ions .
T hes e are us ually, but not nec es s arily, different file
types ; filetype:htm and filetype:html will give you
different res ult c ounts , even though they're the s ame file
type. You c an even s earc h for different page
generators s uc h as A SP, P H P, C G I , and s o
forthp res uming the s ite is n't hiding them behind
redirec tion and proxying. G oogle indexes s everal
different M ic ros oft formats , inc luding P owerP oint (.ppt),
E xc el (.xls ), and Word (.doc).
homeschooling filetype:pdf
"leading economic indicators" filetype:ppt
related:
related:, as you might expec t, finds pages that are
related to the s pec ified page. T his is a good way to find
c ategories of pages ; a s earc h for related:google.com
returns a variety of s earc h engines , inc luding Lyc os ,
Yahoo! , and N orthern L ight.
related:www.yahoo.com
related:www.cnn.com
While an inc reas ingly rare oc c urrenc e, you'll find that
not all pages are related to other pages .
info:
info: provides a page of links to more information about
a s pec ified U RL . T his information inc ludes a link to the
U RL's c ac he, a lis t of pages that link to the U RL , pages
that are related to the U RL , and pages that c ontain the
U RL .
info:www.oreilly.com
info:www.nytimes.com/technology
N ote that this information is dependent on whether
G oogle has indexed the s pec ified U RL ; if not, information
will obvious ly be far more limited.
phonebook:
phonebook:, as you might expec t, looks up phone
numbers .
phonebook:John Doe CA
phonebook:(510) 555-1212
T he phonebook is c overed in detail in [Hack #6].
1.6. Mixing Syntax
T here was a time when you c ouldn't mix G oogle's s pec ial s yntax
elements ; you were limited to one per query. E ven as G oogle
releas ed ever more powerful s pec ial s yntax elements , not being
able to c ombine them for their c ompos ite power s tunted many a
s earc h.
T his has s inc e c hanged. While there remain s ome s yntax
elements that you jus t c an't mix, there are plenty to c ombine in
c lever and powerful ways . A thoughtful c ombination c an do
wonders to narrow a s earc h.
1.6.1. How Not to Mix Syntax
T here are s ome s imple rules to follow when mixing s yntax
elements . T hes e, for the mos t part, revolve around how not to
mix.
D on't mix s yntax elements that will c anc el eac h other
out, s uc h as :
site:ucla.edu -inurl:ucla
H ere you're s aying you want all res ults to c ome from
ucla.edu, but that s ite res ults s hould not have the s tring
"uc la" in the res ults . O bvious ly, that's not going to
produc e many U RL s .
D on't overus e s ingle s yntax elements , as in:
site:com site:edu
While you might think you're as king for res ults from
either .com or .edu s ites , what you're ac tually s aying is
that s ite res ults s hould c ome from both s imultaneous ly.
O bvious ly, a s ingle res ult c an c ome from only one
domain. Take the example perl site:edu site:com. T his
s earc h will get you exac tly zero res ults . Why? Bec aus e a
res ult page c annot c ome from a .edu domain and a .com
domain at the s ame time. I f you want res ults from .edu
and .com domains only, rephras e your s earc h like this :
perl (site:edu | site:com)
With the pipe c harac ter (|), you're s pec ifying that you
want res ults to c ome either from the .edu or the .com
domain.
D on't us e allinurl: or allintitle: when mixing s yntax. I t
takes a c areful hand not to mis us e thes e in a mixed
s earc h. I ns tead, s tic k to inurl: or intitle:. I f you don't
put allinurl: in exac tly the right plac e, you'll c reate odd
s earc h res ults . L et's look at this example:
allinurl:perl intitle:programming
A t firs t glanc e, it looks like you're s earc hing for the
s tring "perl" in the res ult U RL and the word
"programming" in the title. A nd you're right: this will
work fine. But what happens if you move allinurl: to the
right of the query?
intitle:programming allinurl:perl
T his won't bring any res ults . Stic k to inurl: and intitle:,
whic h are muc h more forgiving of where you put them in
a query.
T he s ame advic e goes for allintext: and allinanchor:.
D on't us e s o muc h s yntax that you get too narrow, like:
title:agriculture site:ucla.edu inurl:search
You might find that it's too narrow to give you any us eful
res ults . I f you're trying to find s omething s o s pec ific
that you think you'll need a narrow query, s tart by
building a little bit of the query at a time. Say you want
to find plant databas es at U C L A . I ns tead of s tarting with
the query:
title:plants site:ucla.edu inurl:database
Try s omething s impler:
databases plants site:ucla.edu
and then try adding s yntax to keywords you've already
es tablis hed in your s earc h res ults :
intitle:plants databases site:ucla.edu
or:
intitle:database plants site:ucla.edu
1.6.2. How to Mix Syntax
I f you're trying to narrow down s earc h res ults , the intitle: and
site: s yntax elements are your bes t bet.
1.6.2.1 Titles and sites
For example, s ay you want to get an idea of what databas es are
offered by the s tate of Texas . Run this s earc h:
intitle:search intitle:records site:tx.us
You'll find s omething on the order of 3 0 very targeted res ults .
A nd, of c ours e, you c an narrow down your s earc h even more by
adding keywords :
birth intitle:search intitle:records site:tx.us
I t does n't s eem to matter whether you put plain keywords at the
beginning or the end of the s earc h query; I put them at the
beginning bec aus e they're eas ier to keep up with.
T he site: s yntax, unlike s ite s yntax on other s earc h engines ,
allows you to get as general as a domain s uffix (site:com) or as
s pec ific as a domain or s ubdomain (site:thomas.loc.gov). So if
you're looking for rec ords in E l P as o, you c an us e this query:
intitle:records site:el-paso.tx.us
and you'll get approximately one res ult.
1.6.2.2 Title and URL
Sometimes you want to find a c ertain type of information, but you
don't want to narrow by type. I ns tead, you want to narrow by
theme of information (e.g., you want help or a s earc h engine).
T hat's when you need to s earc h in the U RL .
T he inurl: s yntax will s earc h for a s tring in the U RL but won't
c ount finding it within a larger word. So, for example, if you
s earc h for inurl:research, G oogle will not find pages from
www.res earc hbuzz.c om, but it will find pages from www.res earc h-
c ounc ils .ac .uk.
Say you want to find information on neuros urgery, with an
emphas is on learning or as s is tanc e. Try:
intitle:neurosurgery inurl:help
T his returns a more manageable 8 8 0 or s o res ults . T he whole
point is to get a number of res ults that finds what you need but
is n't s o large as to be overwhelming. I f you find 8 8 0 res ults
overwhelming, you c an eas ily mix the site: s yntax into the
s earc h and limit your res ults to univers ities :
intitle:neurosurgery inurl:help site:edu
Beware, however, of us ing too muc h s pec ial s yntax. A s
mentioned earlier, you c an quic kly detail yours elf into no res ults
at all.
1.6.3. The Antisocial Syntax Elements
T he antis oc ial s yntax elements are the ones that won't mix and
s hould be us ed individually for maximum effec t. I f you try to us e
them with other s yntax elements , you won't get any res ults .
T he s yntax elements that reques t s pec ial informationstocks:,
rphonebook:, bphonebook:, and phonebook: are all antis oc ial. T hat
is , you c an't mix them and expec t to get a reas onable res ult.
T he other antis oc ial s yntax is link:, whic h s hows pages that
have a link to a s pec ified U RL . Wouldn't it be great if you c ould
s pec ify what domains you want the pages to be from? Sorry, you
c an't. T he link: s yntax does not mix with anything els en ot even
plain old keywords .
For example, s ay you want to find out what pages link to O 'Reilly
M edia, I nc ., but you don't want to inc lude pages from the .edu
domain. T he query link:www.oreilly.com -site:edu will not work
bec aus e the link: s yntax does not work in c ombination. Well,
that's not quite c orrec t; you will get res ults , but they'll be for the
phras e "link:www.oreilly.c om" from domains that are not .edu.
I f you want to s earc h for links and exc lude the .edu domain,
there's no s ingle c ommand that will abs olutely work. T his one's
a good try, though:
inanchor:oreilly -inurl:oreilly -site:edu
T his s earc h looks for the word "oreilly" in anchor text, the text
that's us ed to define links ; exc ludes pages that c ontain "oreilly"
in the s earch res ult (e.g., oreilly.com); and, finally, exc ludes thos e
pages that c ome from the .edu domain.
But this type of s earc h is nowhere near c omplete. I t finds only
thos e links to O 'Reilly that inc lude the s tring "oreilly": if
s omeone c reates a link s uc h as <a
href="http://perl.oreilly.com/">Camel Book</a>, it won't be found
by the prec eding query. Furthermore, there are other domains
that c ontain the s tring "oreilly," and there may be domains that
link to "oreilly" that c ontain the s tring "oreilly" but aren't
oreilly.com. You c ould alter the s tring s lightly, to omit the
oreilly.com s ite its elf but not other s ites c ontaining the s tring
"oreilly":
inanchor:oreilly -site:oreilly.com -site:edu
H owever, you would s till be inc luding many O 'Reilly s ites XML.com
and MacDevCenter.com, for ins tanc et hat aren't at oreilly.com.
1.6.4. All the Possibilities
While it is pos s ible to write down every s yntax- mixing
c ombination and briefly explain how they might be us eful, there
wouldn't be room for muc h els e in this book.
E xperiment. E xperiment a lot. C ons tantly keep in mind that mos t
of thes e s yntax elements do not s tand alone, and you c an get
more done by c ombining them than by us ing them individually
D epending on what kind of res earc h you are doing, different
patterns will emerge over time. For example, you may dis c over
that foc us ing on only P D F doc uments (filetype:pdf) finds you
the res ults you need. You may dis c over that you s hould
c onc entrate on s pec ific file types in s pec ific domains
(filetype:ppt site:tompeters.com). M ix up the s yntax in as many
ways as is relevant to your res earc h and s ee what you get.
A s with anything els e, the more you us e G oogle's s pec ial
s yntax, the more natural it will bec ome to you. A nd G oogle is
c ons tantly adding more, muc h to the delight of regular web
c ombers .
I f, however, you want s omething more s truc tured and vis ual than
a s ingle query line, G oogle's A dvanc ed Searc h s hould fit the bill.
1.7. Advanced Search
G oogle's default s imple s earc h allows you to do quite a bit, but
not everything. G oogle's A dvanc ed Searc h page
(http://www.google.c om/advanc ed_s earc h? hl=en), s hown in
Figure 1 - 1 , provides more options , s uc h as date s earc h and
filtering, with "fill in the blank" s earc hing options for thos e who
don't take naturally to memorizing s pec ial s yntax.
Figure 1-1. Google's Advanced Search page
M os t of the options pres ented on this page are s elf- explanatory,
but we'll take a quic k look at the kinds of s earc hes that would be
more diffic ult us ing the s ingle- text- field interfac e of a s imple
s earc h.
1.7.1. Query Words
Bec aus e G oogle us es Boolean AND by default, it's s ometimes
hard to logic ally build out the nuanc es of a partic ular query.
U s ing the text boxes at the top of the A dvanc ed Searc h page,
you c an s pec ify words that mus t appeare xac t phras es , lis ts of
words , at leas t one of whic h mus t appeara nd words to be
exc luded.
1.7.2. Language
U s ing the L anguage pull- down menu, you c an s pec ify what
language all returned pages mus t be in, from A rabic to Turkis h.
1.7.3. File Format
T he File Format option lets you inc lude or exc lude s everal
different file formats , inc luding M ic ros oft Word and E xc el. T here
are a c ouple of A dobe formats (mos t notably P D F) and Ric h Text
Format as options here, too. T his is where the A dvanc ed Searc h
is at its mos t limited; there are literally dozens of file formats
that G oogle c an s earc h for, and this s et of options repres ents
only a s mall s ubs et. To get at the others , us e the filetype:
s pec ial s yntax des c ribed earlier in "Spec ial Syntax."
1.7.4. Date
D ate allows you to s pec ify s earc h res ults updated in the las t
three months , s ix months , or year. T his date s earc h is muc h
more limited than the daterange: s pec ial s yntax, whic h c an give
you res ults as narrow as one day, but G oogle s tands behind the
res ults generated us ing the D ate option on the A dvanc ed
Searc h, while not offic ially s anc tioning the us e of the daterange:
s earc h.
1.7.5. Occurrences
U s ing the O c c urrenc es pull- down menu, you c an s pec ify where
the terms s hould oc c ur. T he options here, other than the default,
generally reflec t the allin*: s yntax elements i n the title
(allintitle:), text (allintext:), U RL (allinurl:), and link anc hors
(allinanchor:) of the page.
1.7.6. Domain
T he D omain feature is an interfac e to the site: s yntax. I t als o
allows negation, explained earlier, to explic itly not return res ults
from a s ite or domain.
1.7.7. Safe Search
G oogle's A dvanc ed Searc h further gives you the option to filter
your res ults us ing SafeSearc h. SafeSearc h only filters s exually
explic it c ontent (as oppos ed to s ome filtering s ys tems that filter
pornography, hate material, gambling information, etc .). P leas e
remember that mac hine filtering is n't 1 0 0 % perfec t.
1.7.8. Additional Google Properties
T he res t of the page provides individual s earc h forms for other
G oogle properties , inc luding a news s earc h, a page- s pec ific
s earc h, and links to s ome of G oogle's topic - s pec ific s earc hes .
T he news s earc h and other topic - s pec ific s earc hes work
independently of the main A dvanc ed Searc h form at the top of
the page.
T he A dvanc ed Searc h page is handy when you need to us e its
unique features or you need s ome help in putting together a
c omplic ated query. I ts "fill- in- the- blank" interfac e will c ome in
handy for the oc c as ional s earc her or s omeone who wants to get
an advanc ed s earc h exac tly right. T hat s aid, it is limiting in
other ways ; it's diffic ult to us e mixed s yntax or build a s ingle
s yntax s earc h us ing OR. For example, there's no way to s earc h
for site:edu OR site:org us ing the A dvanc ed Searc h. T his s earc h
mus t be done from the G oogle s earc h box.
O f c ours e, there's another way you c an alter the s earc h res ults
that G oogle gives you, and it does n't involve the bas ic s earc h
input or the A dvanc ed Searc h page. I t's the preferenc es page,
des c ribed in "Setting P referenc es ," later in this c hapter.
1.8. Quick Links
I f you're a G oogle regular, you've no doubt notic ed thos e
s nippets of linked information proliferating near the top- left of
the firs t res ults page (s ee Figure 1 - 2 ). Where onc e there was
only a s pons ored link or two between you and your res ults , now
there are s pelling s ugges tions , news headlines , s toc k quotes ,
and all other manner of bits and bobs of rather us eful
information.
Figure 1-2. Quick links augment search results
with relevant, current, and local information
G oogle is going beyond Web s earc h res ults to inc lude relevant
finds from its other properties and thos e of third parties . H ere,
briefly, is the c urrent c atalog of quic k links :
Spelling
O ne nic e s ide effec t of G oogle's lis tening to the Web is
that it pic ks up a lot of words along the way. Some
appear in the dic tionary, while others haven't quite made
their way into c ommon parlanc e. Some are made up,
while others are s imply mis s pelled. Q uery G oogle for
s omething that is c ommonly s pelled another way, and
it'll proffer s ome s ugges tions . [Hack #9] delves further
into the wonders of G oogle's s pell c hec ker.
Definitions
T L A s (that's "three- letter ac ronyms ") and geek s peak
abound. Rather than s miling knowingly when you've not a
c lue what s omeone jus t s aid, as k G oogle if it knows what
your friend, bos s , or medic al profes s ional is talking
about. P repend jus t about any word, obs c ure or garden-
variety, with define (e.g., define happy) and the firs t item
on your res ults page will in all probability be a definition
pulled from one of any number of Web dic tionaries . U s e
define: (note the c olone .g., define:osteichthyes) and
you'll pull up a whole page full of definitions [Hack #10] .
News Headlines
G oogle N ews (http://news .google.c om; s ee C hapter 4 )
s c rapes s tories from (at pres ent c ount) 4 ,5 0 0 news
s ourc es . D on't be s urpris ed if there's s omething new and
noteworthy related to your G oogle s earc h.
Travel I nformation
Before you hop on that plane, G oogle your des tination
us ing the airport name (e.g., Los Angeles) or c ode (e.g.,
LAX) and the word airport. C lic k the "V iew c onditions at
[in this c as e] L os A ngeles I nternational (L A X), L os
A ngeles , C alifornia" link to vis it the Federal A viation
A dminis tration's (FA A ) real- time airport s tatus
information. A t the moment of this writing, L A X has no
des tination- s pec ific delays , and both departures and
arrivals are experienc ing fewer than 1 5 - minute gate hold
and airborne delays , res pec tively.
Street Maps
I f G oogle gleans s omething looking like a geographic
loc ation in your s earc h query, it'll provide a link to
Yahoo! and M apQ ues t maps of the area.
Google Local
I nc lude the name of a c ity, s tate, or Zip C ode anywhere
in the U .S. or C anada in your s earc h and G oogle L oc al
(http://loc al.google.c om) [Hack #7] jus t might s ugges t a
loc al find. G oogle for indian food portland oregon and
you'll find yours elf tempted by the flavors of I ndia H ous e
on SW M orris on Street or Wazwan on SW Fourth Street.
Google by Numbers , 1-2-3
You may remember a few important numbers from math
c las s : pi or E or C , for ins tanc e. But numbers hold a very
s pec ial plac e in G oogle's c ollec tive hearta fter all, the
name G oogle c omes from googol, or 1 .0 1 0 100. So it
s houldn't c ome as a s urpris e that the geeks at G oogle
have taught the s earc h engine to pay attention to
partic ular patterns of numbers , inc luding anything that
looks like a c alc ulation
(http://www.google.c om/help/features .html#c alc ulator)
[Hack #47] or fits a s pec ial pattern us ually found in
partic ular referenc e numbers , inc luding:
U P S, FedE x, and U .S. P os tal Servic e trac king
numbers (e.g., 1Z9999W99999999999), linking to the
pac kage s ervic e's trac king page and filling in
the number to get you going.
Vehic le I D (V I N ) numbers (e.g.,
AAAAA999A9AA99999).
U P C c odes (e.g., 073333531084) at
http://www.upc databas e.c om.
Telephone area c odes (e.g., 510) at
http://www.whitepages .c om.
P atent numbers (e.g., patent 4920273) in the U .S.
P atent D atabas e
Federal A viation A dminis tration (FA A ) airplane
regis tration numbers (e.g., n199ua), partic ularly
entertaining when you're waiting to board your
plane, s martphone in hand [Hack #67] ; look for
it on the plane's tail.
Federal C ommunic ations C ommis s ion (FC C )
equipment I D numbers (e.g., fcc B4Z-34009-PIR).
Stock Quotes
Searc h for a s toc k s ymbol [Hack #8] and you'll be
quic k- linked to Yahoo! Financ e.
Froogle Products
I f Froogle (http://froogle.google.c om) finds a produc t
that s eems to be what you're after, it'll link to "P roduc t
s earc h res ults " and two or three offerings at s ites like
eBay, G olfs mith, Buy.c om, and many more.
T here are s ure to be more quic k links by the time you read this .
To keep appris ed of what's new, periodic ally vis it the G oogle Web
Searc h Features (http://www.google.c om/help/features .html), or
jus t keep G oogling and s ee what appears .
1.9. Language Tools
I n the early days of the Web, it s eemed like mos t web pages
were in E nglis h. But as more and more c ountries have c ome
online, materials have bec ome available in a variety of
languages i nc luding languages that don't originate with a
partic ular c ountry (s uc h as E s peranto and Klingon).
G oogle offers s everal language tools , inc luding one for
trans lation and one for G oogle's interfac e. T he interfac e option
is muc h more extens ive than the trans lation option, but the
trans lation has a lot to offer.
T he language tools are available by c lic king the "L anguage
Tools " link on the front page or by going to
http://www.google.c om/language_tools ? hl=en.
1.9.1. Search Specific Languages or
Countries
T he firs t tool allows you to s earc h for materials from a c ertain
c ountry and/or in a c ertain language. T his is an exc ellent way to
narrow your s earc hes ; s earc hing for Frenc h pages from J apan
gives you far fewer res ults than s earc hing for Frenc h pages from
Franc e. You c an narrow the s earc h further by s earc hing for a
s lang word in another language. For example, s earc h for the
E nglis h s lang word bonce on Frenc h pages from J apan.
1.9.2. Translate
T he s ec ond tool on this page allows you to trans late either a
bloc k of text or an entire web page from one language to another.
M os t of the trans lations are to and from E nglis h.
M ac hine trans lation is not nearly as good as human trans lation,
s o don't rely on this trans lation as either the bas is of a s earc h or
as a c ompletely ac c urate trans lation of the page you're looking
at. U s e it ins tead to give you the gis t of whatever it trans lates .
You don't have to c ome to this page to us e the trans lation tools .
When you enter a s earc h, you'll s ee that s ome s earc h res ults
that aren't in your language of c hoic e (whic h you s et via
G oogle's preferenc es ) have "[Trans late this page]" next to their
titles . C lic k on one of thos e and you'll be pres ented with a
framed, trans lated vers ion of the page. T he G oogle frame, at the
top, gives you the option of viewing the original vers ion of the
page, as well as returning to the res ults or viewing a c opy
s uitable for printing.
1.9.3. Interface Language
T he third tool lets you c hoos e the interfac e language for G oogle,
from A frikaans to Wels h. Some of thes e languages are imaginary
(Bork, bork, bork! and E lmer Fudd) but they do work.
Be warned that if you s et your language
preferenc e to Klingon, for example, you'll
need to know Klingon to figure out how to
s et it bac k.
A s one of our Google Hacks readers , J ac ek
A rtymiak, pointed out
(http://hac ks .oreilly.c om/pub/h/3 6 0 ), if
E nglis h is your native tongue, point your
brows er at http://www.google.c om/intl/en.
I f you're not an E nglis h s peaker but
remember or c are to gues s at the language
c ode (e.g., zu for Zulu), drop it in ins tead of
en at the end of that U RL . Further
dis c us s ion revealed that s imply s uffixing
the http://www.google.c om U RL with a
periodh ttp://www.google.c om.h as the s ame
deloc alizing effec t, reverting the interfac e
to E nglis h.
I f you're really s tuc k, delete the G oogle
c ookie from your brows er and reload the
page; this s hould res et all preferenc es to
the defaults .
H ow does G oogle manage to have s o many interfac e languages
when they have s o few trans lation languages ? T he "G oogle in
Your L anguage" program gathers volunteers from around the
world to trans late G oogle's interfac e. (You c an get more
information on that program at
http://www.google.c om/intl/en/language.html.)
1.9.4. Local Domain
Finally, the L anguage Tools page c ontains a lis t of region-
s pec ific G oogle home pages o ver 1 0 0 of them, from D euts c hland
to the P itc airn I s lands .
1.9.5. Making the Most of Google's
Language Tools
While you s houldn't rely on G oogle's trans lation tools to give
you more than the gis t of the meaning (s inc e mac hine
trans lation is n't that good), you c an us e trans lations to narrow
your s earc hes . I des c ribed the firs t method earlier: us e unlikely
c ombinations of languages and c ountries to narrow your res ults .
T he s ec ond way involves us ing the trans lator.
Selec t a word that matc hes your topic and us e the trans lator to
trans late it into another language. (G oogle's trans lation tools
work very well for s ingle- word trans lations like this .) N ow,
s earc h for that word in a c ountry and language that don't matc h
it. For example, you might s earc h for the G erman word
"L ands traße" (highway) on Frenc h pages in C anada. O f c ours e,
you'll have to be s ure to us e words that don't have E nglis h
equivalents or you'll be overwhelmed with res ults .
Whew! By now it s hould be fairly c lear that a s imple interfac e
s uc h as the one G oogle has on its front page does not
nec es s arily imply limited power. Still waters run deep indeed.
N ow that we have all of the tools , tips , and tec hniques under our
belt to help us as k G oogle for what we want before it dives into
the depths of web c ontent, it's time to turn our attention to
unders tanding what it brings bac k to the s urfac e.
1.10. Anatomy of a Search Result
You'd think a lis t of s earc h res ults would be pretty
s traightforward, wouldn't youjus t a page title and a link, pos s ibly
a s ummary? N ot s o with G oogle. G oogle enc ompas s es s o many
s earc h properties and has s o muc h data at its dis pos al that it
fills every res ults page to the rafters . Within a typic al s earc h
res ult you c an find s pons ored links , ads , links to s toc k quotes ,
page s izes , s pelling s ugges tions , and more.
By knowing more of the nitty- gritty details of what's what in a
s earc h res ult, you'll be able to make s ome gues s es ("Wow, this
page that links to my page is very large; perhaps it's a link lis t")
and c orrec t roadbloc ks ("I c an't find my s earc h term on this
page; I 'll c hec k the vers ion G oogle has c ac hed").
L et's us e the word "flowers " to examine this anatomy. Figure 1 - 3
s hows the res ult page for flowers.
Figure 1-3. Result page for "flowers"
Firs t, you'll note at the top of the page is a s elec tion of tabs ,
allowing you to repeat your s earc h ac ros s other G oogle s earc h
c ategories bes ides web pages , inc luding G oogle G roups [ [Hack
#1]. Beneath that you'll s ee a c ount for the number of res ults
and how long the s earc h took: about 4 8 ,0 0 0 ,0 0 0 res ults in 0 .6 1
s ec onds (this will vary, s ometimes by quite a bit).
Sometimes you'll s ee res ults /s ites c alled out on c olored
bac kgrounds at the top or right of the res ults page (s ee Figure
1 - 3 ). T hes e are c alled s pons ored links (read: advertis ements ).
G oogle has a polic y of very c learly dis tinguis hing ads and
s tic king to text- bas ed advertis ing only rather than throwing
flas hing banners in your fac e like other s ites do.
Beneath the s pons ored links you s ometimes s ee a c ategory lis t.
You'll s ee a c ategory lis t only if you're s earc hing for very general
terms and your s earc h c ons is ts of only one word. For example, if
you s earc hed for pinwheel flowers, G oogle wouldn't pres ent the
flowers c ategory.
O ther times you'll s ee news s tories [C hapter 4 ] related to your
query.
Why would you s ee c ategory res ults ? A fter
all, G oogle is a full- text s earc h engine,
is n't it? I t's bec aus e G oogle has taken the
information from the O pen D irec tory
P rojec t (http://www.dmoz.org) and c ros s ed
it with its own popularity rankings to make
the G oogle D irec tory. When you s ee
c ategories , you're s eeing information from
the G oogle D irec tory.
T he firs t real (i.e., nons pons ored) res ult of the s earc h for flowers
is s hown in Figure 1 - 4 .
Figure 1-4. A typical search result
L et's break that down into c hunks , s hall we?
T he top line of eac h res ult is the page title, hyperlinked to the
original page.
T he s ec ond line offers a brief extrac t from this s ite. Sometimes
this is a des c ription of the s ite or a s elec ted s entenc e or two.
Sometimes it's H T M L mus h. G oogle tends to us e des c ription
metatags when they're available; it's rare when you c an look at a
G oogle s earc h res ult and not have even a modic um of an idea
what the s ite is all about.
T he next line s ports s everal informative bits of metadata. Firs t,
there's the U RL ; s ec ond, the s ize of the page (G oogle will have
the page s ize available only if the page has been c ac hed).
T here's a link to a c ac hed vers ion of the page if one is available.
Finally, there's a link to find s imilar pages .
Why would you bother reading the s earc h- res ult metadata? Why
not s imply vis it the s ite and s ee if it has what you want?
I f you've got a broadband c onnec tion and all the time in the
world, you might not want to bother with c hec king out the
metadata. But if you have a s lower c onnec tion and time is at a
premium, c ons ider the s earc h- res ult information.
Firs t, c hec k the page s ummary. Where does your keyword
appear? D oes it appear in the middle of a lis t of s ite names ?
D oes it appear in a way that makes it c lear that the c ontext is
not what you're looking for?
C hec k the s ize of the page if it's available. I s the page very
large? P erhaps it's jus t a link lis ta page full of hyperlinks , as the
name s ugges ts . I s it jus t 1 or 2 KB? I t might be too s mall to find
the level of detail that you're looking for. I f your aim is link lis ts ,
be on the lookout for pages larger than 2 0 KB and s ee [Hack
#1].
P age s ize in G oogle res ults is never going
to be more than 1 0 1 KB. T hat's bec aus e
G oogle does n't index more than 1 0 1 KB
worth of a given web page.
1.11. Setting Preferences
G oogle's P referenc es page, s hown in Figure 1 - 5 , provides a
nic e, eas y way to s et and s ave your s earc hing preferenc es .
Figure 1-5. Google's Preferences page
1.11.1. Interface Language
You c an s et your I nterfac e L anguage, affec ting the language in
whic h tips and mes s ages are dis played. L anguage c hoic es range
from A frikaans to Zulu, with plenty of odd options , inc luding
Bork, bork, bork! (the Swedis h C hef), E lmer Fudd, and P ig L atin,
thrown in for fun.
1.11.2. Search Language
N ot to be c onfus ed with I nterfac e L anguage, Searc h L anguage
res tric ts what languages s hould be c ons idered when s earc hing
G oogle's page index. T he default is any language, but you c ould
be interes ted only in web pages written in C hines e and
J apanes e, or Frenc h, G erman, and Spanis ht he c ombination is up
to you.
1.11.3. SafeSearch Filtering
G oogle's SafeSearc h filtering affords you a method of avoiding
s earc h res ults that may offend your s ens ibilities . N o filtering
means you're offered anything in the G oogle index. M oderate
filtering rules out explic it images , but not explic it language.
Stric t filtering filters both text and images . T he default is
moderate filtering.
1.11.4. Number of Results
By default, G oogle dis plays 1 0 res ults per page. For more
res ults , c lic k any of the "Res ult P age: 1 2 3 ..." links at the
bottom of eac h res ult page, or s imply c lic k the "N ext" link.
You c an s pec ify your preferred number of res ults per page (1 0 ,
2 0 , 3 0 , 5 0 , or 1 0 0 ), along with whether you want res ults to open
in the c urrent window or a new brows er window.
1.11.5. Settings for Researchers
For the purpos e of res earc h, it's bes t to have as many s earc h
res ults as pos s ible on the page. Bec aus e it's all text, it does n't
take that muc h longer to load 1 0 0 res ults than it does to load
1 0 . I f you have a c omputer with a dec ent amount of memory, it's
als o good to have s earc h res ults open in a new window; it'll keep
you from los ing your plac e and leave you a window with all the
s earc h res ults readily available.
I f you c an s tand it, leave your filtering turned off, or at leas t limit
the filtering to moderate ins tead of s tric t. M ac hine filtering is not
perfec t and, unfortunately, enabling it might mean that you'll
mis s s omething valuable. T his is es pec ially true when you're
s earc hing for a phras e that might be c aught by a filter, s uc h as
"breas t c anc er."
U nles s you're abs olutely s ure that you always want to do a
s earc h in one language, I 'd advis e agains t s etting your language
preferenc es on this page. I ns tead, alter language preferenc es as
needed us ing the G oogle L anguage Tools ["L anguage Tools "
earlier in this c hapter].
Between the s imple s earc h, advanc ed s earc h, and preferenc es ,
you've got all the tools nec es s ary to build the G oogle query to
s uit your partic ular purpos es .
I f you have c ookies turned off in your
brows er, s etting preferenc es in G oogle
is n't going to do you muc h good. You'll
have to res et them every time you open
your brows er. I f you c an't have c ookies
and you want to us e the s ame preferenc es
every time, c ons ider making a c us tomized
s earc h form.
1.12. Understanding Google URLs
I f you're like mos t people, you us ually pay little attention to the
U RL s in your brows er's addres s bar as you s urf from one s ite to
the next. A nd you might c hoos e to s tic k with this habit while
s earc hing G oogle. You ought to know, however, that a s ubtle
alteration made to the U RL that G oogle returns after a s earc h
c an be an effic ient method of tweaking your res ult s et. I n fac t,
there's at leas t one thing you c an do by fiddling with (we like to
c all it hacking) the U RL that you c an do no other way, and there
are quic k tric ks that might s ave you a trip bac k to the A dvanc ed
Searc h page.
Say you want to s earc h for three blind mice. T he U RL of the page
of res ults will vary depending on the preferenc es you've s et, but
it will look s omething like this :
http://www.google.com/search?num=100&hl=en&q=%22three+blind+mice%2
T he query its elfq=%22three+blind+mice%22, %22 being a U RL-
enc oded " (double quote)i s pretty obvious , but let's break down
what thos e extra bits mean.
T he num=100 refers to the number of s earc h res ults to a page:
1 0 0 in this c as e. G oogle ac c epts any number from 1 to 1 0 0 .
A ltering the value of num is a nic e s hortc ut to altering the
preferred s ize of your res ult s et without having to meander over
to the A dvanc ed Searc h page and rerun your s earc h.
D on't s ee the num= in your U RL ? Simply append it by c lic king at
the end of the U RL in your brows er's addres s bar and typing it in.
To s et the number of res ults per page to 2 0 , for ins tanc e, you'd
add &num=20.
You c an add or alter any of the modifiers
des c ribed here by appending them to the
U RL or c hanging their values t he part after
the = (equals )t o s omething within the
ac c epted range for the modifier in
ques tion. I f you're adding a modifier, you'll
need to us e an & s ymbol (ampers and) too.
L ook at how the modifiers are joined
together on U RL s for other s earc h res ults
to s ee how it's done.
T he hl=en means the language interfac et he language in whic h
you us e G oogle, reflec ted in the home page, mes s ages , and
buttons i s in E nglis h. G oogle's L anguage Tools ["L anguage
Tools " earlier in this c hapter] page provides a lis t of language
c hoic es . Run your mous e over eac h language c hoic e and notic e
the c hange reflec ted in the U RL ; the U RL for P ig L atin looks like
this :
http://www.google.com/intl/xx-piglatin/
T he language c ode is the bit between intl/ and the las t /xx-
piglatin, in this c as e. A pply that to the s earc h U RL at hand by
altering the exis ting value of hl:
hl=xx-piglatin
What if you put multiple hl modifiers on a res ult U RL ? G oogle
honors whic hever one c omes las t, reading from left to right.
While it makes for c onfus ing U RL s , this means that you c an
always res ort to lazines s and add an extra modifier at the end
rather than editing what's already there, like s o:
http://www.google.com/search?num=100&hl=en&q=%22three+blind+mice%2
T here's one more modifier that, appended to your U RL , may
provide s ome us eful modific ations of your res ults :
safe=off
M eans the SafeSearc h filter is off. T he SafeSearc h filter
removes s earc h res ults of a s exually explic it nature.
safe=on means the SafeSearc h filter is on.
P laying about with G oogle's U RL may not s eem like the mos t
intuitive way to get res ults quic kly, but it's muc h fas ter than
reloading the A dvanc ed Searc h form, and in one c as e (the
"months old" modifier), it's the only way to get at a partic ular s et
of res ults .
Hack 1. Browse the Google Directory
Google has a searchable subject index in addition to its eight-
billion-page web search.
G oogle's Web Searc h indexes over eight billion pages , whic h
means that it is n't s uitable for all s earc hes . When you've got a
s earc h that you c an't narrow down, like if you're looking for
information on a pers on about whom you know nothing, billions of
pages will get very frus trating very quic kly.
But you don't have to limit your s earc hes to the Web. G oogle
als o has a s earc hable s ubjec t index, the G oogle D irec tory, at
http://direc tory.google.c om. I ns tead of indexing the entirety of
billions of pages , the direc tory des c ribes s ites ins tead, indexing
about 1 .5 million U RL s . T his makes it a muc h better s earc h for
general topic s .
D oes G oogle s pend time building a s earc hable s ubjec t index in
addition to a full- text index? N o, G oogle bas es its direc tory on
the O pen D irec tory P rojec t data at http://dmoz.org/. T he
c ollec tion of U RL s at the O pen D irec tory P rojec t is gathered and
maintained by a group of volunteers , but G oogle does add s ome
of its own G ooglis h magic to it.
A s you c an s ee in Figure 1 - 6 , the front of the s ite is organized
into s everal topic s . To find what you're looking for, you c an either
do a keyword s earc h, or drill down through the hierarc hies of
s ubjec ts .
Figure 1-6. The Google Directory
Bes ide mos t lis tings , a c ouple of whic h are s hown in Figure 1 - 7 ,
you'll s ee a green bar. T he green bar is an approximate indic ator
of the s ite's P ageRank in the G oogle s earc h engine. (N ot every
lis ting in the G oogle D irec tory has a c orres ponding P ageRank in
the G oogle web index.) Web s ites are lis ted in the default order of
G oogle P ageRank, but you als o have the option to lis t them in
alphabetic al order.
Figure 1-7. Individual listings under Science >
Math > Mathematicians > Nash, John F., Jr.
O ne thing you'll notic e about the G oogle D irec tory is how the
annotations and other information varies between the
c ategories . T hat's bec aus e the information in the direc tory is
maintained by a s mall army of thous ands of volunteers who are
eac h res pons ible for one or more c ategories . For the mos t part,
annotation is pretty good.
1.13.1. Searching the Google Directory
Bec aus e the G oogle D irec tory is a far s maller c ollec tion of
U RL s , ideal for more general s earc hing, it does not have the
various c omplic ated s pec ial s yntaxes for s earc hing that the web
s earc h does . H owever, there are a c ouple of s pec ial s yntaxes
that you s hould know about.
intitle:
J us t like the G oogle web s pec ial s yntax, intitle:
res tric ts the query word s earc h to the title of a page.
inurl:
Res tric ts the query word s earc h to the U RL of a page.
When you're s earc hing on G oogle's web index, your
overwhelming c onc ern is probably how to reduc e your lis t of
s earc h res ults to s omething manageable. With that in mind, you
might s tart with the narrowes t pos s ible s earc h.
T hat's a reas onable s trategy for the web index, but bec aus e you
have a narrower pool of s ites in the G oogle D irec tory, you want
that s earc h to be more general.
For example, s ay you were looking for information on author P. G .
Wodehous e. A s imple s earc h on P. G. Wodehouse in G oogle's web
index will get you over 8 6 ,0 0 0 res ults , pos s ibly c ompelling you
to immediately narrow down your s earc h. But doing the s ame
s earc h in the G oogle D irec tory returns only 1 4 3 res ults . You
might c ons ider that a manageable number of res ults , or you
might want to s tart c arefully narrowing down your res ults further.
T he D irec tory is als o good for s earc hing for events . A G oogle
web s earc h for " Korean War" will find you a million and a half
res ults , while s earc hing the G oogle D irec tory will find jus t over
2 ,8 3 0 . T his is a c as e where you will probably need to narrow
down your s earc h. U s e general words indic ating what kind of
information you wanttimeline, for example, or archives, or lesson
plans. D on't narrow down your s earc h with names or
loc ations t hat's not the bes t way to us e the G oogle D irec tory.
Hack 2. Glean a Snapshot of Google in
Time
Google Zeitgeist provides a weekly, monthly, and yearly
overview of what the Web was interested in.
Turning to G oogle its elf for a definition of zeitgeis t,
(define:zeitgeist), there's c ons ens us that it refers to "the s pirit
of the times ." A nd G oogle Zeitgeis t
(http://www.google.c om/pres s /zeitgeis t.html) is jus t that: a
mirror that the Web (ac c ording to G oogle) holds up to us ,
providing a s naps hot of the week, month, or year that was .
A typic al weekly G oogle Zeitgeis t (Figure 1 - 8 ) lis ts the top 1 0
gaining and dec lining queries and s ome hand- pic ked s tatis tic s
(e.g., top G oogle N ews queries , popular s equels ), fun fac ts (e.g.,
Tour de Franc e vers us Wimbledon), aggregate information
gleaned about G ooglers (e.g., operating s ys tems , web brows ers ,
languages ), and any other trends that the Zeitgeis t c rew c ares
to delve into.
Figure 1-8. The week's top 10 gaining and
declining queries
I t takes only a few moments of vis iting G oogle Zeitgeis t before
you're itc hing to go bac k a little further in time: the week your
s ec ond c hild was born, the month of the O lympic s , the year you
graduated from high s c hool. C lic k the "A rc hived information
available here" link to brows e the G oogle Zeitgeis t A rc hive
(Figure 1 - 9 ) of updates for every week, month, and year s inc e
J anuary 2 0 0 1 .
Weekly Zeitgeis t updates ac tually s tarted
in J une 2 0 0 1 at the s ame time the
monthlies s witc hed from P D F to H T M L
format.
Figure 1-9. The Zeitgeist Archive holds weekly,
monthly, and yearly updates from January 2001
to today
T he monthlies and year- ends provide more detail with trend
graphs and als o further break down s earc hing by c ountry, from
Korea to C anada and points in between.
While G oogle Zeitgeis t's s tatis tic s aren't earth s hattering (e.g.,
"Searc hes for `iraq' more than doubled on M arc h 1 9 , the date
that O peration I raqi Freedom began"i magine that! ), it does
provide you a s naps hot of what the world in aggregate (5 5 billion
s earc hes in 2 0 0 3 ) found interes ting enough to look up.
1.14.1. See Also
I f G oogle Zeitgeis t piques your interes t, you might als o
try the Yahoo! Buzz I ndex (http://buzz.yahoo.c om), a
s imilar c ollec tion of s tatis tic s around popular Yahoo!
Searc hes : the day's top movers (overall and by various
Yahoo! c ategories ), mos t viewed and emailed Yahoo!
news items , and a market trend- like c hart (c lic k the
"V iew C omplete C hart..." link as s oc iated with any of the
buzz lis tings on the front page) of leaders and movers ,
ac c ording to buzz s core
(http://help.yahoo.c om/help/us /buzz/#buzz- 0 4 ).
Hack 3. Graph Google Results over Time
Use Google as a trend watcher.
A s of N ovember 2 0 0 4 , G oogle's index c ontains a whopping eight
billion pages and growing. A nd it does n't jus t rec ord pages ; it's
filled with news and events , c ommentary and dis c us s ion,
c hanges and trends . You might think of G oogle as a mirror that
we hold up to the Web that approximates how we define and
repres ent ours elves and our world.
I t s hould c ome as no s urpris e, then, that people s pend an awful
lot of time and energy watc hing G oogle res ults in an attempt to
s pot emerging topic s and trac k trends . I f you've been tapped to
do this for your c ompany, produc t, projec t, or s ervic e, G - M etric s
(http://g- metric s .c om) might be right up your alley. G - M etric s
meas ures the oc c urrenc e of a keyword or s et of keywords
defined by you ac ros s timec omplete with graphs .
Regis ter with G - M etric s (regis tration requires only your name
and email addres s ) for a login key. O nc e logged in, you c an s et
queries , alter, remove, or review your queries and the res ults
they've c aptured. Figure 1 - 1 0 s hows my c urrent watc hlis t, eac h
query s porting a res ult c ount and perc entage c hange over time.
Figure 1-10. G-Metrics watchlist results
C lic k a query for a trend graph from the time you added the
s earc h, c ounts for the pas t s even days , and G oogle's c urrent top
1 0 res ults for that query, as s hown in Figure 1 - 1 1 .
Figure 1-11. G-Metrics trend graphing and
details for a particular query
We als o s how you how to trac k res ult c ounts over time [Hack
#3], but G - M etric s takes this further, allowing you to monitor
trends without a lot of legwork; your queries are "s et it and
forget it." You c an even s ubs c ribe to an RSS feed of the res ults
of any one of your queries . Sure, you c ould s et up G oogle A lerts
[Hack #59], feed the numbers into a s preads heet, and do the
graphing yours elfb ut why?
Hack 4. Visualize Google Results
The TouchGraph Google Browser is the perf ect Google
complement f or those who appreciate visual displays of
inf ormation.
Some people are born text c rawlers . T hey c an retrieve the
mos tly text res ourc es of the I nternet and brows e them happily
for hours . But others are more vis ually oriented and find that the
flat text res ults of the I nternet leave s omething to be des ired,
es pec ially when it c omes to s earc h res ults .
I f you're the type who apprec iates vis ual dis plays of information,
you're bound to like the Touc hG raph G oogle Brows er
(http://www.touc hgraph.c om/T G G oogleBrows er.html). T his J ava
applet allows you to s tart with the pages that are s imilar to one
U RL , and then expand outward to pages that are s imilar to the
firs t s et of pages , on and on, until you have a giant map of nodes
(a.k.a. U RL s ) on your s c reen.
T he Touc hG raph G oogle Brows er was c reated by A lex Shapiro
(http://www.touc hgraph.c om/).
N ote that what you're finding here are U RL s that are s imilar to
another U RL . You aren't doing a keyword s earc h, and you're not
us ing the link: s yntax. You're s earc hing by G oogle's meas ure of
s imilarity.
1.16.1. Starting to Browse
Start your journey by entering a U RL on the Touc hG raph home
page and c lic king the "G raph I t" link. Your brows er will launc h
the Touc hG raph J ava applet, c overing your window with a large
mas s of linked nodes , as s hown in Figure 1 - 1 2 .
Figure 1-12. Mass of linked nodes generated by
TouchGraph
You'll need a web brows er c apable of
running J ava applets . I f J ava s upport in
your preferred brows er c omes in the form
of a plug- in, your brows er s hould have the
s marts to launc h a plug- in
loc ator/downloader and walk you through
the ins tallation proc es s .
I f you're eas ily entertained like me, you might amus e yours elf
for a while jus t by c lic king and dragging the nodes around. But
there's more to do than that.
1.16.2. Expanding Your View
H old your mous e over one of items in the group of pages . You'll
notic e that a little box with an H pops up. C lic k on that and you'll
get a box of information about that partic ular node, as s hown in
Figure 1 - 1 3 .
Figure 1-13. Node information pop-up box
T he box of information c ontains title, s nippet, and U RL p retty
muc h everything you'd get from a regular s earc h res ult. C lic k on
the U RL in the box to open that U RL's web page its elf in another
brows er window.
N ot interes ted in vis iting web pages jus t yet? Want to do s ome
more s earc h vis ualization? D ouble- c lic k on one of the nodes .
Touc hG raph us es the A P I to reques t from G oogle pages s imilar
to the U RL of the node you double- c lic ked. Keep double- c lic king
at will; when no more pages are available, a green C will appear
when you put your mous e over the node (no more than 3 0
res ults are available for eac h node). I f you do it often enough,
you'll end up with a whole s c reen full of nodes with lines denoting
their relations hip to one another, as Figure 1 - 1 4 s hows .
Figure 1-14. Node mass expanded by double-
clicking on nodes
1.16.3. Visualization Options
O nc e you've generated s imilarity page lis tings for a few different
s ites , you'll find yours elf with a pretty c rowded page.
Touc hG raph has a few options to c hange the look of what you're
viewing.
For eac h node, you c an s how page title, page U RL , or point (the
firs t two letters of the title). I f you're jus t brows ing page
relations hips , the title's probably bes t. H owever, if you've been
working with the applet for a while and have mapped out a
plethora of nodes , the point or U RL options c an s ave s ome
s pac e. T he U RL option removes the www and .com from the U RL ,
leaving the other domain s uffixes . For example, www.perl.com
will s how as perl, while www.perl.org s hows as perl.org.
Speaking of s aving s pac e, there's a zoom s lider on the upper-
right s ide of the applet window. When you've generated s everal
dis tinc t groups of nodes , zooming out allows you to s ee the
different groupings more c learly. H owever, it als o bec omes
diffic ult to s ee relations hips between the nodes in the different
groups .
Touc hG raph offers the option to view the s ingles , the nodes in a
group that have a relations hip with only one other node. T his
option is off by default; c hec k the Show Singles c hec kbox to turn
it on. I find it's better to leave them out; they c rowd the page
and make it diffic ult to es tablis h and explore s eparate groups of
nodes .
T he Radius s etting s pec ifies how many nodes will s how around
the node you've c lic ked on. A radius of 1 will s how all nodes
direc tly linked to the node you've c lic ked, a radius of 2 will s how
all nodes direc tly linked to the node you've c lic ked as well as all
nodes direc tly linked to thos e nodes , and s o on. T he higher the
radius , the more c rowded things get. T he groupings do, however,
tend to s ettle thems elves into nic e little dis c ernable c lumps , as
s hown in Figure 1 - 1 5 .
Figure 1-15. Node mass with Radius set to 4
A drop- down menu bes ide the Radius s etting s pec ifies how many
s earc h res ults (i.e., how many c onnec tions ) are s hown. A s etting
of 1 0 is , in my experienc e, optimal.
1.16.4. Making the Most of These
Visualizations
Yes , it's c ool. Yes , it's unus ual. A nd yes , it's fun dragging thos e
little nodes around. But what exac tly is the Touc hG raph good
for?
Touc hG raph does two rather us eful things . Firs t, it allows you to
s ee at a glanc e the s imilarity relations hip between large groups
of U RL s . You c an't do this with s everal flat res ults to s imilar U RL
queries . Sec ond, if you do s ome exploration you c an s ometimes
get a lis t of c ompanies in the s ame indus try or area. T his c omes
in handy when you're res earc hing a partic ular indus try or topic .
I t'll take s ome exploration, though, s o keep trying.
Hack 5. Check Your Spelling
Google sometimes takes the liberty of "correcting" what it
perceives is a spelling error in your query.
I f you've ever us ed other I nternet s earc h engines , you'll have
experienc ed what I c all s tupid s pellcheck. T hat's when you enter
a proper noun and the s earc h engine s ugges ts a c ompletely
ludic rous query ("E lvis h P ars ley" for "E lvis P res ley"). G oogle's
quite a bit s marter than that.
When G oogle thinks it c an s pell individual words or c omplete
phras es in your s earc h query better than you c an, it'll s ugges t a
"better" s earc h, hyperlinking it direc tly to a query. For example,
if you s earc h for hydrecefallus, G oogle will as k if you meant
hydrocephalus, as s hown in Figure 1 - 1 6 .
Figure 1-16. Google offers spelling suggestions
when it thinks it knows better
Sugges tions as ide, G oogle will as s ume that you know of what
you s peak and return your reques ted res ults , provided that your
query gleaned res ults .
I f your query found no res ults for the s pellings you provided and
G oogle believes it knows better, it will automatic ally run a new
s earc h of its own s ugges tions . T hus , a s earc h for hydrecefallus
finding (hopefully) no res ults will s park a G oogle- initiated s earc h
for hydrocephalus.
G iven the s heer number of pages on the
Web and the odds that at leas t one of the
people proffering a page on the s ubjec t
you're after either c an't s pell or c an't type,
I don't s ee thes e automatic ally generated
s earc hes bas ed on G oogle's s ugges tions
c oming up that often thes e days .
For ins tanc e, bec aus e two web pages c ite
this hac k as it firs t appeared in the
previous edition of this title, the
hydrec efallus example is blown. A nd I
c ouldn't find another mis s pelling that both
c ame up s hort on res ults and for whic h
G oogle had any s ugges tions .
O n the other handa t leas t for nowa s earc h
for spodding oil texas turns up only 4
res ults , while the s ame s earc h with c orrec t
s pelling ("s pudding"), spudding oil texas,
returns 7 0 8 .
M ind you, G oogle does not arbitrarily c ome up with its
s ugges tions , but builds them bas ed on its own databas e of words
and phras es found while indexing the Web. I f you s earc h for
nons ens e like garafghafdghasdg, you'll get no res ults and be
offered no s ugges tions .
T his is a lovely s ide effec t and a quic k and
eas y way to c hec k the relative frequenc y
of s pellings . Q uery for a partic ular
s pelling, making note of the number of
res ults . T hen c lic k on G oogle's s ugges ted
s pelling and note the number of res ults .
I t's s urpris ing how c los e the c ounts are
s ometimes , indic ating an oft- mis s pelled
word or phras e.
1.17.1. Embrace Misspellings
D on't make the mis take of automatic ally dis mis s ing the
proffered res ults from a mis s pelled word, partic ularly a proper
name. I 've been a fan of c artoonis t Bill M auldin for years now,
but I repeatedly mis s pell his name as "Bill M audlin." A nd
judging from a quic k G oogle s earc h, I 'm not the only one. T here
is no law s tating that every page mus t be s pellc hec ked before it
goes online, s o it's often worth taking a look at res ults des pite
mis s pellings .
A s an experiment, try s earc hing for two mis s pelled words on a
related topic , like ventriculostomy hydrocephalis. What kind of
information did you get? C ould the information you got, if any, be
grouped into a partic ular online genre?
A t the time of this writing, the s earc h for ventriculostomy
hydrocephalis gets only 1 0 res ults . T he c ontent here is generally
from people dealing with various neuros urgic al problems . A gain,
there is no law that s tates that all web materials have to be
s pellc hec ked.
U s e this to your advantage as a res earc her. When you're looking
for layman ac c ounts of illnes s and injury, the c ontent you des ire
might ac tually be more often mis s pelled than not. O n the other
hand, when looking for highly tec hnic al information or referenc es
from c redible s ourc es , filtering out mis s pelled queries will bring
you c los er to the information you s eek.
Hack 6. Google Phonebook: Let Google's
Fingers Do the Walking
Google makes an excellent phonebook, even to the extent of
doing reverse lookups.
G oogle c ombines res idential and bus ines s phone number
information and its own exc ellent interfac e to offer a phonebook
lookup that provides lis tings for bus ines s es and res idenc es in
the U nited States . H owever, the s earc h offers three different
s yntaxes , different levels of information provide different res ults ,
the s yntaxes are finic ky, and G oogle does n't provide
doc umentation.
1.18.1. The Three Syntaxes
G oogle offers three ways to s earc h its phonebook:
phonebook
Searc hes the entire G oogle phonebook
rphonebook
Searc hes res idential lis tings only
bphonebook
Searc hes bus ines s lis tings only
T he res ult page for phonebook: lookups lis ts
only five res ults , res idential and bus ines s
c ombined. T he more s pec ific rphonebook:
and bphonebook: s earc hes provide up to 3 0
res ults per page. For a better c hanc e of
finding what you're looking for, us e the
appropriate targeted lookup.
1.18.2. Using the Syntaxes
U s ing a s tandard phonebook requires knowing quite a bit of
information about what you're looking for: firs t name, las t name,
c ity, and s tate. G oogle's phonebook requires no more than las t
name and s tate to get it s tarted. C as ting a wide net for all the
Smiths in C alifornia is as s imple as :
phonebook:smith ca
Try giving 4 1 1 a whirl with that reques t! Figure 1 - 1 7 s hows the
res ults of the query.
Figure 1-17. A phonebook: result page
N otic e that, while intuition might tell you that there are
thous ands of Smiths in C alifornia, the G oogle phonebook s ays
that there are only 6 0 0 . J us t as G oogle's regular s earc h engine
maxes out at 1 ,0 0 0 res ults , its phonebook maxes out at 6 0 0 .
Fair enough. Try narrowing down your s earc h by adding a firs t
name, c ity, or both:
phonebook:john smith los angeles ca
A t the time of this writing, the G oogle phonebook found 2
bus ines s and 2 0 res idential lis tings for J ohn Smith in L os
A ngeles , C alifornia.
1.18.3. Caveats
T he phonebook s yntaxes are powerful and us eful, but they c an
be diffic ult to us e if you don't remember a few things about how
they work.
T he s yntaxes are c as e s ens itive.
Searc hing for phonebook:john doe ca works , while
Phonebook:john doe ca (notic e the c apital P ) does n't.
Wildc ards don't work.
T hen again, they're not needed, s inc e the G oogle
phonebook does all the wildc arding for you. For example,
if you want to find s hops in N ew York with "C offee" in the
title, don't bother trying to envis ion every permutation of
"C offee Shop," "C offee H ous e," and s o on. J us t s earc h
for bphonebook:coffee new york ny and you'll get a lis t of
any bus ines s in N ew York whos e name c ontains the word
"c offee."
E xc lus ions don't work.
P erhaps you want to find c offee s hops that aren't
Starbuc ks . You might think phonebook:coffee -starbucks
new york ny would do the tric k. A fter all, you're s earc hing
for c offee and not Starbuc ks , right? U nfortunately not;
G oogle thinks you're looking for both the words "c offee"
and "s tarbuc ks ," yielding jus t the oppos ite of what you
were hoping for: everything Starbuc ks in N Y C .
OR does n't always work.
You might s tart wondering if G oogle's phonebook
ac c epts OR lookups . You then might experiment, trying to
find all the c offee s hops in Rhode I s land or H awaii:
bphonebook:coffee (ri | hi). U nfortunately that does n't
work; the only lis tings you'll get are for c offee s hops in
H awaii. T hat's bec aus e G oogle does n't appear to s ee
the (ri | hi) as a s tate c ode, but rather as another
element of the s earc h. So if you revers ed your s earc h
above, and s earc hed for coffee (hi | ri), G oogle would
find lis tings that c ontained the s tring "c offee" and either
the s trings "hi" or "ri." So you'll find H i-T ide C offee (in
M as s ac hus etts ) and s everal c offee s hops in Rhode
I s land. I t's neater to us e OR in the middle of your query,
and then s pec ify your s tate at the end. For example, if
you want to find c offee s hops that s ell either donuts or
bagels , this query works fine: bphonebook:coffee (donuts |
bagels) ma. T hat finds s tores that c ontain the word
"c offee" and either the word "donuts " or the word
"bagels " in M as s ac hus etts . T he bottom line: you c an
us e an OR query on the s tore or res ident name, but not on
the loc ation.
Try s ome phonebook lookups that
you c ouldn't do by dialing 4 1 1 . For
example, try s earc hing by las t
name and area c ode, or las t name
and Zip C ode! G oogle's phone book
lookup is very ac c ommodating.
1.18.4. Reverse Phonebook Lookup
A ll three phonebook s yntaxes s upport revers e lookup, though
it's probably bes t to us e the general phonebook: s yntax to avoid
not finding what you're looking for due to its res idential or
bus ines s c las s ific ation.
To do a revers e s earc h, jus t enter the phone number with area
c ode. L ookups without area c ode won't work.
phonebook:(707) 827-7000
(T his is the phone number of O 'Reilly world headquarters in
Sebas topol, C alifornia, U SA .)
N ote that revers e lookups on G oogle are a hit- or- mis s
propos ition and don't always produc e res ults . I f you're not
having any luc k, c ons ider us ing a more dedic ated phonebook
s ite s uc h as WhiteP ages .c om (http://www.whitepages .c om/).
1.18.5. Finding Phonebooks Using Google
While G oogle's phonebook is a good s tarting point, its
us efulnes s is limited. I f you're looking for a phone number at a
univers ity or other large ins titution, while you won't find the
number in G oogle, you c ertainly c an find the appropriate
phonebook, if it's online.
I f you're looking for a univers ity phonebook, try this s imple
s earc h firs t: inurl:phone site:university.edu, replac ing
university.edu with the domain of the univers ity you're looking
for. For example, to find the online phonebook of the U nivers ity of
N orth C arolina at C hapel H ill, you'd s earc h for:
inurl:phone site:unc.edu
I f that does n't work, there are s everal variations you c an try,
again s ubs tituting your preferred univers ity's domain for unc.edu:
title:"phone book" site:unc.edu
(phonebook | "phone book") lookup faculty staff site:unc.edu
inurl:help (phonebook | "phone book") site:unc.edu
I f you're looking for s everal univers ity phonebooks , try the s ame
s earc h with the more generic site:edu rather than a s pec ific
univers ity's domain. T here are als o web s ites that lis t univers ity
phonebooks , one of whic h is the P honebook G ateway Server
L ookup (http://www.uiuc .edu/c gi- bin/ph/lookup), with over 3 3 0
phonebooks .
Hack 7. Think Global, Google Local
Take web searching to the streetsyour street, in f act. Google
Local narrows down all those zillions of results to those within
range of a particular city, state, or postal code.
While the Web and G oogle have taught us to think global when it
c omes to looking for information, web s earc hes often fail in the
s imple tas k of finding things in our own bac kyards . Sure, the
is land of C elebes is the home to Sulawes i Kalos s i, but where c an
I find the fines t c up of Sulawes i c offee within walking dis tanc e?
A nd even more importantly: do they have free wireles s I nternet
ac c es s ?
T hat's not to s ay that G oogle is n't paying attention to any
mention of loc ale in your queries . I f you were, let's s ay, to
s earc h for scooters san francisco, you would notic e a s et of loc al
San Franc is c o finds ["Q uic k L inks " earlier in this c hapter] at the
top of the res ults page. A s you c an s ee in Figure 1 - 1 8 , G oogle
als o provides addres s es , phone numbers , and mileage (from the
c enter of San Franc is c o, pres umably).
Figure 1-18. Local find sometimes appear as
"magic links" at the top of the results page
G oogle c ombines its index with data gleaned from the Yellow
P ages to zero in on loc al res ults that very often prove
interes ting and us eful.
T his data is s o interes ting, in fac t, that G oogle has taken the
s ervic e beyond that s prinkling of magic links , launc hing G oogle
L oc al (http://loc al.google.c om), a loc ation- aware frontend to the
G oogle s earc h engine. T he G oogle L oc al home page (Figure 1 -
1 9 ) looks very muc h like what you're us ed to from G oogle, the
only real differenc e being that there are two s earc h boxes
ins tead of jus t the one: What and Where. I n the What box, you
type your s earc h query as us ual. I n the Where box, you c an
loc alize your s earc h by providing a c ity (by its elf, if the c ity is
unambiguous ly well- knowne .g., San Franc is c o or N ew York, not
Rome or C onc ord) and a s tate name or Zip C ode.
Figure 1-19. The Google Local home page
Before you get too exc ited about finding
that perfec t c offee s hop on the is land of
C elebes , you s hould know that G oogle
L oc al s earc hes the U nited States and
C anada only. D on't get too us ed to that
limitation, though: G oogle is planning on
expanding.
Before you c lic k the G oogle Searc h button, notic e that a
"Remember this loc ation" c hec kbox is c hec ked by default s o
that the next time you vis it, G oogle will fill in your preferred
loc ale for you.
You c an c hange the Where at any time and,
as long as that "Remember this loc ation"
c hec kbox is c hec ked, G oogle will
remember for next time. I f, for s ome
reas on, you'd like to c lear the Where
c ompletely, jus t point your brows er at
http://loc al.google.c om/loc al? s l=1 .
M y query for scooters san francisco turned up a nic e c ollec tion of
s c ooter s hops , s ervic e c enters , and other motorc yc le- and
s c ooter- foc us ed res ults in and around San Franc is c o, as s hown
in Figure 1 - 2 0 . N otic e that eac h of the res ults is as s igned a
letter (e.g., San Franc is c o Sc ooter C entre is "A ") as s oc iated
with a pin in the map of the area to the right. E ac h res ult, as with
the magic links , has as s oc iated addres s , phone number, and
mileage information; there's als o a link to driving direc tions over
at M apQ ues t.c om.
Figure 1-20. Google Local results
You c an further c ons train the s earc h us ing the "Searc h within: 1
mile - 5 miles - 1 5 miles - 4 5 miles " links in the G oogle L oc al
toolbar or by dropping down into one of the "Show only"
c ategories lis ted beneath the query fields at the top of the page.
C lic k one of the res ults and you're taken not to the s ite its elf (in
fac t, the bus ines s or s ervic e may not even have a web s ite ...
s hoc king, I know! ), but to further detail. I f the bus ines s or
s ervic e does indeed have a web pres enc e, it's likely to be the
firs t of the referenc es lis ted. But this is n't nec es s arily s o; for
ins tanc e, while s fs c ooterc entre.c om is indeed the online home of
the San Franc is c o Sc ooter C entre, San Franc is c o M oto &
Sc ooter does n't live at s c ooter.c om (as is s hown in one of the
two res ults for the s c ooter s hop), but at s fmoto.c om.
A s you c an s ee in Figure 1 - 2 1 , the map zeros in on only that one
res ult and G oogle appends Referenc es to the bottom of the page.
T hes e are s ites that refer to (and I don't jus t mean link to) the
s earc h res ult that you're foc us ed on.
Figure 1-21. A typical Google Local result,
complete with map and references
You c an pan around by c lic king anywhere
on any map to pull that bit into the c enter
or c lic king the N , S, E , W links to the right
of the map. Zoom us ing the Zoom I n or
Zoom O ut links or jump direc tly to a
partic ular s c ale (e.g., Street or C ity) by
c lic king any link on the zoom numbered
s c ale. C lic k the L arger M ap link at the
bottom of any map for a more detailed view
of the area or D riving D irec tions to find
your way there, no matter where you are.
While G oogle L oc al is s till in beta at the time of this writing, it
c ertainly s eems to have promis e. A bout the only thing mis s ing
at this point is the ability to narrow a loc ale to the area around a
partic ular addres s rather than jus t c ity or Zip C ode.
1.19.1. See Also
Yahoo L oc al (http://loc al.yahoo.c om) ac tually goes even
more loc al than G oogle L oc al, s upporting full addres s es
rather than jus t c ity, s tate, and Zip C ode.
Hack 8. Track Stocks
A well-craf ted Google query will usually net you company
inf ormation beyond those provided by traditional stock services.
A mong the pantheon of les s er- known G oogle s yntaxes is
stocks:. Searc hing for stocks:symbol, where symbol repres ents the
s toc k you're looking for, will redirec t you to Yahoo! Financ e
(http://financ e.yahoo.c om/) for details . T he Yahoo! page is
ac tually framed by G oogle; off to the top- left is the G oogle logo,
along with links to Q uic ken, Fool.c om, M SN M oneyC entral, and
other financ ial s ites .
Feed G oogle a bum stock: query and you'll s till find yours elf at
Yahoo! Financ e, us ually s taring at a quote for s toc k that you've
never heard of or a "Stoc k N ot Found" page. O f c ours e, you c an
us e this to your advantage. E nter stocks: followed by the name of
a c ompany you're looking for (e.g., stocks:friendly). I f the
c ompany's name is more than one word, c hoos e the mos t unique
word. Run your query and you'll arrive at the Yahoo! Financ e
s toc k lookup page s hown in Figure 1 - 2 2 .
Figure 1-22. Yahoo! Finance stock lookup page
N otic e the L ook U p button; c lic k it and you'll be offered a lis t of
c ompanies that matc h "friendly" in s ome way. From there you
c an get the s toc k information that you want (as s uming the
c ompany you wanted is on the lis t).
1.20.1. Beyond Google for Basic Stock
Information
G oogle is n't partic ularly s et up for bas ic s toc k res earc h. You'll
have to do your initial groundwork els ewhere, returning to G oogle
armed with a better unders tanding of what you're looking for. I
rec ommend going s traight to Yahoo! Financ e
(http://financ e.yahoo.c om) to quic kly look up s toc ks by s ymbol
or c ompany name. T here you'll find all the bas ic s : quotes ,
c ompany profiles , c harts , and rec ent news . For more in- depth
c overage, I heartily rec ommend H oovers
(http://www.hoovers .c om). Some of the information is free. For
more depth, you'll have to pay a s ubs c ription fee.
1.20.2. More Stock Research with Google
Try s earc hing G oogle for:
"Tootsie Roll"
N ow add the s toc k s ymbol, T R, to your query:
"Tootsie Roll" TR
A ha! I ns tantly the s earc h res ults s hift to financ ial information.
N ow, add the name of the C E O :
"Tootsie Roll" TR "Melvin Gordon"
You'll end up with a nic e, s mall, targeted lis t of res ults , as s hown
in Figure 1 - 2 3 .
Figure 1-23. Using a stock symbol to limit
results
Stoc k s ymbols are great "fingerprints " for I nternet res earc h.
T hey're c ons is tent, they often appear along with the c ompany
name, and they're us ually enough to narrow down your s earc h
res ults to relevant information.
T here are als o s everal words and phras es that you c an us e to
narrow down your s earc h for c ompany- related information.
Replac ing company with the name of the c ompany you're looking
for, try thes e:
For pres s releas es : " company announced", " company
announces", " company reported"
For financ ial information: company " quarterly report",
company SEC, company financials, company " p/e ratio"
For loc ation information: company parking airport
locationd oes n't always work but s ometimes works
amazingly well
Hack 9. Consult the Dictionary
Google, in addition to its own spellchecking index, provides
hooks into Dictionary.com.
G oogle's s pellc hec king [Hack #5] is built on its own word and
phras e databas e, gleaned while indexing web pages . T hus , it
provides s ugges tions for les s er- known proper names , phras es ,
c ommon s entenc e c ons truc ts , etc . G oogle als o offers a
definition s ervic e powered by D ic tionary.c om
(http://www.dic tionary.c om). Suc h definitions , c oming from a
c redible s ourc e and augmented by various s pec ialty indexes ,
c an be more limited.
Run a s earc h. O n the res ults page, you'll notic e the phras e
"Searc hed the web for [query words ]." I f the query words would
appear in a dic tionary, they will be hyperlinked to a dic tionary
definition. I dentified phras es will be linked as a phras e; for
example, the query " jolly roger" will allow you to look up the
phras e "jolly roger." O n the other hand, the phras e " computer
legal" will allow you to look up the s eparate words "c omputer"
and "legal."
T he definition s earc h will s ometimes fail on obs c ure words , very
new words , s lang, and tec hnic al voc abularies (otherwis e known
as j argon). I f you s earc h for a word's meaning and G oogle c an't
help you, try enlis ting the s ervic es of a meta- s earc h dic tionary,
like O neL ook (http://www.onelook.c om/), whic h indexes over s ix
million words from over 1 ,0 0 0 dic tionaries . I f that does n't work,
try G oogle again with one of the following tric ks , queryword being
the word you want to find:
I f you're s earc hing for s everal words y ou're reading a
tec hnic al manual, for examples earc h for them at the
s ame time. Sometimes you'll find a glos s ary this way.
For example, maybe you're reading a book about
marketing, and you don't know many of the words . I f you
s earc h for storyboard stet SAU, you'll get only a few
s earc h res ults , and they'll all be glos s aries .
Try s earc hing for your word and the word glossary, s ay,
stet glossary. Be s ure to us e an unus ual word; you may
not know what a "s pread" is in the c ontext of marketing
but s earc hing for spread glossary will get you over two
million res ults for many different kinds of glos s aries . See
[Hack #20] for language trans lation.
Try s earc hing for the phras e queryword means or the words
What does queryword mean?.
I f you're s earc hing for a medic al or a tec hnic al item,
narrow your s earc h to educ ational (.edu) s ites . I f you
want a c ontextual definition for us ing equine
ac upunc ture and how it might be us ed to treat laminitis ,
try " equine acupuncture" laminitis.
site:edu will give you a brief lis t of res ults . Furthermore,
you'll avoid book lis ts and online s tores , whic h is handy
if you're s eeking information and don't nec es s arily want
to purc has e anything. I f you're s earc hing for s lang, try
narrowing your s earc h to s ites like G eoc ities and Tripod
and s ee what happens . Sometimes young people pos t
fan s ites and other informal c ultural c ollec tions on free
plac es like G eoc ities , and us ing thes e, you c an find
many examples of s lang in c ontext ins tead of dry lis ts of
definitions . T here are an amazing number of glos s aries
on G eoc ities ; s earc h for glossary site:geocities.com, and
s ee for yours elf.
G oogle's c onnec tion with D ic tionary.c om means that s imple
definition c hec king is fas t and eas y. But even more obs c ure
words c an be quic kly found if you apply a little c reative thinking.
Hack 10. Look Up Definitions
Do you f ind yourself smiling knowingly when your boss mentions
that well-known business principle that you've never heard of ?
Overwhelmed with "geek speak"? Chances are Google's heard it
mentionedand possibly even def inedsomewhere bef ore.
M os t s pec ialized voc abularies remain, for the mos t part, fairly
s tatic ; words don't s uddenly c hange their meaning all that often.
N ot s o with tec hnic al and c omputer- related jargon. I t s eems like
every 1 2 s ec onds s omeone c omes up with a new buzzword or
term relating to c omputers or the I nternet, and then 1 2 minutes
later it bec omes obs olete or means s omething c ompletely
differento ften more than one thing at a time. M aybe it's not that
bad. I t jus t feels that way.
G oogle c an help you in two ways , by helping you look up words
and by helping you figure out what words you don't know but
need to know.
1.22.1. Google Definitions
Before you as s ume you're going to be in for a lot of G oogling, try
the define s earc h s yntax mentioned in the "Q uic k L inks " s ec tion
earlier in this c hapter. Simply prepend the definition you're after
with the s pec ial s yntax keyword define, like s o:
define google
define julienne
define 42
G oogle tells us that thes e are defined as "mos t important
s pidering s earc h engine," "c ut a vegetable into long thin
matc hs tic ks ," and "being two more than forty," thanks to to
J uic e N ew M edia's Searc h E ngine G los s ary, T he Youth O nline
C lub, and WordN et at P rinc eton, res pec tively.
C lic k the as s oc iated "D efinition in c ontext" link to vis it the page
from whic h the definition was drawn.
C lic k the "Web definitions for..." link or prefix the word you're
defining with define: (note the addition of a c olon) in the firs t
plac e, and you'll net a full page of definitions drawn from all
manner of plac es . For ins tanc e, define:TLA finds turns up oodles
of definitions (all about the s ame, mind you), as s hown in Figure
1 -2 4 .
Figure 1-24. A page chock-full of definitions for
TLA
T he define word s yntax is s till s ubjec t to
s pelling s ugges tions , s o you don't have to
worry too muc h about mis s pelling. T he
define:word form, however, does n't perform
a web s earc h at all and s o will return no
res ults or s pelling s ugges tions whats oever
if it finds no definitions to offer you.
I f all that didn't turn up anything us eful, move on to G oogle Web
Searc h proper.
1.22.2. Slang
We have dis tinc tive s peec h patterns that are s haped by our
educ ations , our families , and our loc ation. Further, we may us e
another s et of words bas ed on our oc c upation. When a teenager
s ays s omething is "phat," that's s langa s pec ialized voc abulary
us ed by a partic ular group. When a c opywriter s c ribbles "s tet"
on an ad, that's not s lang, but it's s till s pec ialized voc abulary or
jargon us ed by a c ertain groupi n this c as e, the advertis ing
indus try.
Being aware of thes e s pec ialty words c an make all the differenc e
when it c omes to s earc hing. A dding s pec ialized words to your
s earc h querywhether s lang or indus try jargonc an really c hange
the s lant of your s earc h res ults .
Slang gives you one more way to break up your s earc h engine
res ults into geographic ally dis tinc t areas . T here's s ome
geographic al blurrines s when you us e s lang to narrow your
s earc h engine res ults , but it's amazing how well it works . For
example, s earc h G oogle for football. N ow s earc h for football
bloke. Totally different res ults s et, is n't it? N ow s earc h for
football bloke bonce. N ow you're into s oc c er narratives .
O f c ours e, this is not to s ay that everyone in E ngland
automatic ally us es the word "bloke" any more than everyone in
the s outhern U .S. automatic ally us es the word "y'all." But adding
well- c hos en bits of s lang (whic h will take s ome experimentation)
will give a whole different tenor to your s earc h res ults and may
point you in unexpec ted direc tions . You c an find s lang from the
following res ourc es :
T he P robert E nc yc lopediaS lang
(http://www.probertenc yc lopaedia.c om/s lang.htm)
T his s ite is brows eable by firs t letter or s earc hable by
keyword. (N ote that the keyword s earc h c overs the
entire Probert Encyclopedia; s lang res ults are near the
bottom.) Slang is from all over the world. I t's often
c ros s - linked, es pec ially drug s lang. A s with mos t s lang
dic tionaries , this s ite c ontains material that might
offend.
A D ic tionary of Slang (http://www.peevis h.c o.uk/s lang/)
T his s ite foc us es on s lang heard in the U nited Kingdom,
whic h means s lang from other plac es as well. I t's
brows eable by letter or via a s earc h engine. Words from
outs ide the U .K. are marked with their plac e of origin in
brac kets . D efinitions als o indic ate typic al us age:
humorous , vulgar, derogatory, etc .
Surfing for Slang (http://www.s praaks ervic e.net/s langportal)
O f c ours e, eac h area in the world has its own s lang. T his
s ite has a good meta- lis t of E nglis h and Sc andinavian
s lang res ourc es .
Start out by s earc hing G oogle for your query without the s lang.
C hec k the res ults and dec ide where they're falling s hort. A re
they not s pec ific enough? A re they not loc ated in the right
geographic al area? A re they not c overing the right
demographic t eenagers , for example?
I ntroduc e one s lang word at a time. For example, for a s earc h for
"football," add the word "bonc e" and c hec k the res ults . I f they're
not narrowed down enough, add the word "bloke." A dd one word
at a time until you get the res ults that you want. U s ing s lang is
an inexac t s c ienc e, s o you'll have to do s ome experimenting.
H ere are s ome things to be c areful of when us ing s lang in your
s earc hes :
Try many different s lang words .
D on't us e s lang words that are generally c ons idered
offens ive, exc ept as a las t res ort. Your res ults will be
s kewed.
Be c areful when us ing teenage s lang, whic h c hanges
c ons tantly.
Try s earc hing for s lang when us ing G oogle G roups . Slang
c rops up often in c onvers ation.
M inimize your s earc hes for s lang when s earc hing for
more formal s ourc es , s uc h as news paper s tories .
D on't us e s lang phras es if you c an help it; in my
experienc e, s lang c hanges too muc h to be c ons is tently
s earc hable. Stic k to es tablis hed words .
1.22.3. Industrial Slang
Spec ialized voc abularies are thos e us ed in partic ular s ubjec t
areas and indus tries . T he medic al and legal fields are good
examples of s pec ialized voc abularies , although there are many
others .
When you need to tip your s earc h to the more tec hnic al, the
more s pec ialized, and the more in- depth, think of a s pec ialized
voc abulary. For example, do a G oogle s earc h for heartburn. N ow
do a s earc h for heartburn GERD. N ow do a s earc h for heartburn GERD
" gastric acid". You'll s ee eac h of them is very different.
With s ome fields , finding s pec ialized voc abulary res ourc es will
be a s nap. But with others , it's not that eas y. A s a jumping- off
point, try the G los s aris t s ite at http://www.glos s aris t.c om, whic h
is a s earc hable s ubjec t index of about 6 ,0 0 0 different glos s aries
c overing dozens of different topic s . T here are als o s everal other
large online res ourc es c overing c ertain s pec ific voc abularies .
T hes e res ourc es inc lude:
T he O n- L ine M edic al D ic tionary
(http://c anc erweb.nc l.ac .uk/omd/)
T his dic tionary c ontains voc abulary relating to
bioc hemis try, c ell biology, c hemis try, medic ine,
molec ular biology, phys ic s , plant biology, radiobiology,
s c ienc e, and tec hnology, and c urrently has over 4 6 ,0 0 0
lis tings .
You c an brows e the dic tionary by letter or s earc h it by
word. Sometimes you c an s earc h for a word that you
know (bruise) and find another term that might be more
c ommon in medic al terminology (contusion). You c an als o
brows e the dic tionary by s ubjec t. Bear in mind that this
dic tionary is in the U .K. and s ome s pellings may be
s lightly different for A meric an us ers (e.g., "tumour"
vers us "tumor").
M edTerms .c om (http://www.medterms .c om/)
M edTerms .c om has far fewer definitions (around
1 5 ,0 0 0 ), but it als o has extens ive artic les from
M edic ineN et. I f you're s tarting from abs olute s quare one
with your res earc h and you need s ome bas ic information
and voc abulary to get s tarted, s earc h M edic ineN et for
your term (bruise works well) and then move to
M edTerms .c om to s earc h for s pec ific words .
L aw.c om's legal dic tionary
(http://dic tionary.law.c om/lookup2 .as p)
L aw.c om's legal dic tionary is exc ellent bec aus e you c an
s earc h either words or definitions ; you c an brows e, too.
For example, you c an s earc h definitions for the word
inheritance and get a lis t of all the entries that c ontain
the word "inheritanc e." T his is an eas y way to get to the
words "muniment of title" without knowing the path.
A s with s lang, add s pec ialized voc abulary s lowlyo ne word at a
timea nd antic ipate that it will narrow down your s earc h res ults
very quic kly. For example, take the word "s pudding," often us ed
in as s oc iation with oil drilling. Searc hing for spudding by its elf
finds only about 4 ,3 0 0 res ults on G oogle. A dding Texas knoc ks it
down to 5 8 1 res ults , and this is s till a very general s earc h! A dd
s pec ialized voc abulary very c arefully or you'll narrow down your
s earc h res ults to the point where you c an't find what you want.
1.22.4. Researching Terminology with
Google
Firs t things firs t: for heaven's s ake, pleas e don't jus t plug the
abbreviation into the query box! For example, s earc hing for XSLT
will net you over two million res ults . While c ombing through the
s ites that G oogle turns up may eventually lead you to a
definition, there's s imply more to life than that. I ns tead, add
" stands +for" to the query if it's an abbreviation or ac ronym.
" XSLT stands +for" returns around 1 9 4 res ults , and the third is a
tutorial glos s ary. I f you're s till getting too many res ults (" XML
stands +for" gives you almos t 3 ,0 0 0 res ults ) try adding
beginners or newbie to the query. " XML stands +for" beginners
brings in 2 2 7 res ults , the third being "XM L for beginners ."
I f you're s till not getting the res ults you want, try " What is X?" or
" X +is short +for" or X beginners FAQ, where X is the ac ronym or
term. T hes e s hould be regarded as s ec ond- tier methods ,
bec aus e mos t s ites don't tend to us e phras es s uc h as "What is
X? " on their pages , "X is s hort for" is unc ommon language
us age, and X might be s o new (or s o obs c ure) that it does n't yet
have a FA Q entry. T hen again, your mileage may vary and it's
worth a s hot; there's a lot of terminology out there.
I f you have hardware- or s oftware- s pec ific terminologya s
oppos ed to hardware- or s oftware- relatedt ry the word or phras e
along with anything you might know about its us age. For
example, as a P erl module, D ynaL oader is s oftware- s pec ific
terminology. T hat muc h known, s imply give the two words a s pin:
DynaLoader Perl
I f the res ults you're finding are too advanc ed, as s uming you
already know what a D ynaL oader is , s tart playing with the words
beginners, newbie, and the like to bring you c los er to information
for beginners :
DynaLoader Perl Beginners
I f you s till c an't find the word in G oogle, there are a few pos s ible
c aus es : perhaps it's s lang s pec ific to your area, your c oworkers
are playing with your mind, you heard it wrong (or there was a
typo on the printout that you got), or it's very, very new.
1.22.5. Where to Go When It's Not on
Google
D es pite your bes t efforts , you're not finding good explanations of
the terminology on G oogle. T here are a few other s ites that
might have what you're looking for:
Whatis (http://whatis .tec htarget.c om)
A s earc hable s ubjec t index of c omputer terminology,
from s oftware to telec om. T his is es pec ially us eful if
you've got a hardware- or s oftware- s pec ific word
bec aus e the definitions are divided up into c ategories .
You c an als o brows e alphabetic ally. A nnotations are
good and are often c ros s - indexed.
Webopedia (http://www.pc webopaedia.c om/)
Searc hable by keyword or brows eable by c ategory. T his
s ite als o has a lis t of the newes t entries on the front
page s o that you c an c hec k for new words .
N etlingo (http://www.netlingo.c om/frames index.html)
T his is more I nternet oriented. T his s ite s hows up with a
frame on the left c ontaining the words , with the
definitions on the right. I t inc ludes lots of c ros s -
referenc ing and really old s lang.
Tec h E nc yc lopedia (http://www.tec hweb.c om/enc yc lopedia/)
Features definitions and information on over 2 0 ,0 0 0
words . T he top 1 0 terms s earc hed for are lis ted s o that
you c an s ee if everyone els e is as c onfus ed as you are.
T hough entries had before- the- lis ting and after- the-
lis ting lis ts of words , I s aw only moderate c ros s -
referenc ing.
G eek terminology proliferates almos t as quic kly as web pages .
D on't worry too muc h about deliberately keeping upi t's jus t
about impos s ible. I ns tead, us e G oogle as a "ready referenc e"
res ourc e for definitions .
Hack 11. Search Article Archives
Google serves as a handy searchable archive f or back issues of
online publications.
N ot all s ites have their own s earc h engines , and even the ones
that do are s ometimes diffic ult to us e. C omplic ated or
inc omplete s earc h engines are more pain than gain when
attempting to s earc h through arc hives of publis hed artic les . I f
you follow a c ouple of rules , G oogle is handy for finding bac k
is s ues of publis hed res ourc es .
T he tric k is to us e a c ommon phras e to find the information
you're looking for. L et's us e The New York Times as an example.
1.23.1. Articles from the NYT
Your firs t intuition when s earc hing for previous ly publis hed
artic les from N Y T imes .c om might be to s imply us e
site:nytimes.com in your G oogle query. For example, if I wanted to
find artic les on G eorge Bus h, why not us e:
"george bush" site:nytimes.com
T his will indeed find you all artic les mentioning G eorge Bus h
publis hed on N Y T imes .c om. What it won't find is all the artic les
produc ed by The New York Times but republis hed els ewhere.
While doing res earc h, keep c redibility
firmly in mind. I f you're doing c as ual
res earc h, maybe you don't need to double-
c hec k a s tory to make s ure that it ac tually
c omes from The New York Times , but if
you're res earc hing a term paper, double-
c hec k the verac ity of every artic le you find
that is n't ac tually on The New York Times
s ite.
What you ac tually want is a c lear identifier, no matter the s ite of
origin, that an artic le c omes from The New York Times . C opyright
dis c laimers are perfec t for the job. A New York Times c opyright
notic e typic ally reads :
Copyright 2004 The New York Times Company
O f c ours e, this would only find artic les from 2 0 0 4 . A s imple
workaround is to replac e the year with a G oogle full- word
wildc ard ["Full- Word Wildc ards " earlier in this c hapter]:
Copyright * The New York Times Company
L et's try that G eorge Bus h s earc h again, this time us ing the
s nippet of c opyright dis c laimer ins tead of the site: res tric tion:
"Copyright * The New York Times Company" "George Bush"
A t the time of this writing, you get over s ix times as many
res ults for this s earc h as for the earlier attempt.
1.23.2. Magazine Articles
C opyright dis c laimers are als o us eful for finding magazine
artic les . For example, Scientific American's typic al c opyright
dis c laimer looks like this :
Scientific American, Inc. All rights reserved.
(T he date appears before the dis c laimer, s o I jus t dropped it to
avoid having to bother with wildc ards .)
U s ing that dis c laimer as a quote- delimited phras e along with a
s earc h wordhologram, for exampley ields the G oogle query:
hologram "Scientific American, Inc. All rights reserved."
A t the time of this writing, you'll get 3 1 res ults , whic h s eems
like a s mall number for a general query like hologram. When you
get fewer res ults than you'd expec t, fall bac k on us ing the site:
s yntax to go bac k to the originating s ite its elf.
hologram site:sciam.com
I n this example, you'll find s everal res ults that you c an grab
from G oogle's c ac he but are no longer available on the Scientific
American s ite.
M os t public ations that I 've c ome ac ros s have s ome kind of
c ommon text s tring that you c an us e when s earc hing G oogle for
its arc hives . U s ually it's a c opyright dis c laimer and mos t often
it's at the bottom of a page. U s e G oogle to s earc h for that s tring
and whatever query words you're interes ted in, and if that
does n't work, fall bac k on s earc hing for the query s tring and
domain name.
Hack 12. Find Directories of Information
Use Google to f ind directories, link lists, and other collections of
inf ormation.
Sometimes you're more interes ted in large information
c ollec tions than s c ouring for s pec ific bits and bobs . U s ing
G oogle, there are a c ouple of different ways of finding direc tories ,
link lis ts , and other information c ollec tions . T he firs t way makes
us e of G oogle's full- word wildc ards ["Full- Word Wildards " earlier
in this c hapter] and the intitle: ["Spec ial Syntax" earlier in this
c hapter]. T he s ec ond is judic ious us e of partic ular keywords .
1.24.1. Title Tags and Wildcards
P ic k s omething you'd like to find c ollec tions of information
about. We'll us e "trees " as our example. T he firs t thing we'll look
for is any page with the words "direc tory" and "trees " in its title.
I n fac t, we'll build in a little buffering for words that might appear
between the two us ing a c ouple of full- word wildc ards (*
c harac ters ). T he res ultant query looks s omething like this :
intitle:"directory * * trees"
T his query will find "direc tories of evergreen trees ," "South
A fric an trees ," and of c ours e "direc tories c ontaining s imply
trees ."
What if you wanted to take things up a notc h, taxonomic ally
s peaking, and find direc tories of botanic al information? You'd
us e a c ombination of intitle: and keywords , like s o:
botany intitle:"directory of"
A nd you'd get almos t 1 ,0 0 0 res ults . C hanging the tenor of the
information might be a matter of res tric ting res ults to thos e
c oming from ac ademic ins titutions . A ppending an edu s ite
s pec ific ation brings you to:
botany intitle:"directory of" site:edu
T his gets you around 1 5 0 res ults , a mixture of res ourc e
direc tories , and, uns urpris ingly, direc tories of univers ity
profes s ors .
M ixing thes e s yntaxes works rather well when you're s earc hing
for s omething that might als o be an offline print res ourc e. For
example:
cars intitle:"encyclopedia of"
T his query pulls in res ults from A mazon.c om and other s ites
s elling c ar enc yc lopedias . Filter out s ome of the more obvious
book finds by tweaking the query s lightly:
cars intitle:"encyclopedia of" -site:amazon.com
-inurl:book -inurl:products
T he query s pec ifies that s earc h res ults s hould not c ome from
A mazon.c om and s hould not have the word "produc ts " or "book"
in the U RL , whic h eliminates a fair amount of online s tores . P lay
with this query by c hanging the word "c ars " to whatever you'd
like for s ome interes ting finds .
O f c ours e there are lots of s ites s elling
books online, but when it c omes to
injec ting "nois e" into res ults when you're
trying to find online res ourc es and
res earc h- oriented information,
A mazon.c om is the bigges t offender. I f
you're ac tually looking for books , try
+site:amazon.com ins tead.
I f mixing s yntaxes does n't do the tric k for the res ourc es you
want, there are s ome c lever keyword c ombinations that might
jus t do the tric k.
1.24.2. Finding Searchable Subject
Indexes with Google
T here are a few major s earc hable s ubjec t indexes and myriad
minor ones that deal with a partic ular topic or idea. You c an find
the s maller s ubjec t indexes by c us tomizing a few generic
s earc hes . " what's new" " what's cool" directory, while gleaning a
few fals e res ults , is a great way of finding s earc hable s ubjec t
indexes . directory " gossamer tHReads" new is an interes ting one.
G os s amer T hreads is the c reator of a popular link direc tory
program. T his is a good way to find s earc hable s ubjec t indexes
without too many fals e hits . directory " what's new" categories cool
does n't work partic ularly well, bec aus e the word "direc tory" is
not a very reliable s earc h term, but you will pull in s ome things
with this query that you might otherwis e have mis s ed.
L et's put a few of thes e into prac tic e:
"what's new" "what's cool" directory phylum
"what's new" "what's cool" directory carburetor
"what's new" "what's cool" directory "investigative journalism"
"what's new" directory categories gardening
directory "gossamer threads" new sailboats
directory "what's new" categories cool "basset hounds"
T he real tric k is to us e a more general word, but make it unique
enough that it applies mos tly to your topic and not to many other
topic s .
Take ac upunc ture, for ins tanc e. Start narrowing it down by topic :
what kind of ac upunc ture? For people or animals ? I f for people,
what kind of c onditions are being treated? I f for animals , what
kind of animals ? M aybe you s hould be s earc hing for " cat
acupuncture", or maybe you s hould be s earc hing for acupuncture
arthritis. I f this firs t round does n't narrow down s earc h res ults
enough for you, keep going. A re you looking for educ ation or
treatment? You c an s kew res ults one way or the other by us ing
the site: s yntax. So maybe you want " cat acupuncture" site:com
or arthritis acupuncture site:edu. J us t by taking a few s teps to
narrow things down, you c an get a reas onable number of s earc h
res ults foc us ed around your topic .
Hack 13. Seek Out Weblog Commentary
Build queries to search only recent commentary appearing in
weblogs.
T here was a time when you needed to find c urrent c ommentary,
you didn't turn to a full- text s earc h engine like G oogle. You
s earc hed U s enet, c ombed mailing lis ts , or s earc hed through
c urrent news s ites like C N N .c om and hoped for the bes t.
But as s earc h engines have evolved, they've been able to index
pages more quic kly than onc e every few weeks . I n fac t, G oogle
tunes its engine to more readily index s ites with a high
information c hurn rate. A t the s ame time, a phenomenon c alled
the weblog (http://www.oreilly.c om/c atalog/es s blogging/) has
aris en: an online s ite keeps a running c ommentary and
as s oc iated links , updated dailya nd indeed, even more often in
many c as es . G oogle indexes many of thes e s ites on an
ac c elerated s c hedule. I f you know how to find them, you c an
build a query that s earc hes jus t thes e s ites for rec ent
c ommentary.
1.25.1. Finding Weblogs
When weblogs firs t appeared on the I nternet, they were generally
updated manually or by us ing homemade programs . T hus , there
were no s tandard words you c ould add to a s earc h engine to find
them. N ow, however, many weblogs are c reated us ing either
s pec ialized s oftware pac kages (ls uc h as M ovable Type,
http://www.movabletype.org, or Radio U s erland,
http://radio.us erland.c om) or as web s ervic es (s uc h as Blogger,
http://www.blogger.c om/). T hes e programs and s ervic es are
more eas ily found online with s ome c lever us e of s pec ial
s yntaxes or magic words .
For hos ted weblogs , the site: s yntax makes things eas y. Blogger
weblogs hos ted at blog* s pot (http://www.blogs pot.c om) c an be
found us ing site:blogspot.com. E ven though Radio U s erland is a
s oftware program able to pos t its weblogs to any web s erver, you
c an find the majority of Radio U s erland weblogs at the Radio
U s erland c ommunity s erver (http://radio.weblogs .c om) us ing
site:radio.weblogs.com.
Finding weblogs powered by weblog s oftware and hos ted
els ewhere is more problematic ; M ovable Type weblogs , for
example, c an be found all over the I nternet. H owever, mos t of
them s port a "powered by movable type" link of s ome s ort;
s earc hing for the phras e " powered by movable type" will, therefore,
find many of them.
I t c omes down to magic words typic ally found on weblog pages ,
s hout- outs , if you will, to the s oftware or hos ting s ites . T he
following is a lis t of s ome of thes e pac kages and s ervic es and
the magic words us ed to find them in G oogle:
Blogger
" powered by blogger" or site:blogspot.com
Blos xom
" powered by blosxom"
Greymatter
" powered by greymatter"
Geeklog
" powered by geeklog"
Manila
" a manila site" or site:editthispage.com
Pitas (a s ervice)
site:pitas.com
pMachine
" powered by pmachine"
uJournal (a s ervice)
site:ujournal.org
LiveJournal (a s ervice)
site:livejournal.com
Radio Us erland
intitle:"radio weblog" or site:radio.weblogs.com
WordPres s
" powered by wordpress"
1.25.2. Using These "Magic Words"
Bec aus e you c an't have more than 1 0 words in a G oogle query,
there's no way to build a query that inc ludes every c onc eivable
weblog's magic words . I t's bes t to experiment with the various
words , and s ee whic h weblogs have the materials you're
interes ted in.
Firs t of all, realize that weblogs are us ually informal c ommentary
and you'll have to keep an eye out for mis s pelled words , names ,
etc . G enerally, it's better to s earc h by event than by name, if
pos s ible. For example, if you were looking for c ommentary on a
potential s trike, the phras e " baseball strike" would be a better
s earc h, initially, than a s earc h for the name of the C ommis s ioner
of M ajor L eague Bas eball, Bud Selig.
You c an als o try to s earc h for a word or phras e relevant to the
event. For example, for a bas eball s trike you c ould try s earc hing
for " baseball strike" " red sox" (or " baseball strike" bosox). I f
you're s earc hing for information on a wildfire and wondering if
anyone had been arres ted for ars on, try wildfire arrested and, if
that does n't work, wildfire arrested arson.
Why not s earc h for arson to begin with?
Bec aus e it's not c ertain that a weblog
c ommentator would us e the word "ars on."
I ns tead, he might jus t refer to s omeone
being arres ted for s etting the fire.
"A rres ted" in this c as e is a more c ertain
word than "ars on."
Hack 14. Cover Your Bases
Try all possible combinations of your search keywords at once.
You've got a s et of query words but are not s ure that they're the
right s et; you c ertainly don't want to mis s any res ults by pic king
the wrong c ombination of keywords , inc luding or exc luding the
wrong word. But the thought of typing in a dozen- plus
permutations of keywords has your c arpal tunnel flaring up in
horror.
Searc h G rid (http://blog.outer- c ourt.c om/s earc h- grid) lets you
explore a wide range of G oogle s earc h res ults by automatic ally
s earc hing for the various pos s ible c ombinations of your
keywords .
T here are two vers ions of Searc h G rid. T he older vers ion
features a grid that you fill with s earc h words that you want to
c ombine. You might, for example, put c ats up, mus tard, and
pic kles on the x- axis and relis h, onions , and tomatoes on the y-
axis , as s hown in Figure 1 - 2 5 .
Figure 1-25. Search Grid populated with
keywords to combine
Searc h G rid c ombines the res ults relis h c ats up, relis h mus tard,
relis h pic kles , onions c ats up, onions mus tard, onions pic kles ,
etc .a nd provides you with the firs t res ult of eac h pos s ible
c ombination, s hown in Figure 1 - 2 6 .
Figure 1-26. The first result of several different
searches, all in one grid
N ote that you're not getting anything but the firs t res ult; this is
not the tool to us e if you want a very in- depth s earc h of eac h
query. I ns tead, it's meant to give you a bird's -eye view of the how
the different c ombinations of s earc h words impac t the query.
T here's a new vers ion of Searc h G rid that's been integrated into
a web tool c alled FindForward (http://www.findforward.c om/?
t=grid), whic h gives you s c reens hots of s ome G oogle s earc h
res ults . T hat one requires les s typing. J us t enter two to five
words for whic h you want to c hec k pos s ible permutations . You'll
get a large grid of s earc h res ults , with s c reens hots available for
s ome of the pages , as s hown in Figure 1 - 2 7 .
Figure 1-27. Google search resultsnow with
screenshots!
N ote that this grid will s earc h eac h of your keywords individually
(one s quare for mus tard, one for pic kles , one for relis h) and will
s earc h every pos s ible c ombination of two words (pic kles relis h,
pic kles mus tard, mus tard relis h, etc .), but it won't s earc h for
three- and four- word permutations . I n other words , this tool
won't find every s ingle las t pos s ible permutation of your s earc h.
A gain, it's an overview that gives you an idea of how different
word c ombinations c an affec t your s earc h, and it is not meant to
be exhaus tive.
U s e this hac k when you want to get a s ens e of how different
queries are going to affec t your s earc h, when you're not s ure
about what s et of s earc h words will work bes t for you, and when
you want to experiment with expanding your s earc h without
having to type s everal s ets of keywords over and over again.
1.26.1. See Also
I t's hard to believe, but when it c omes to building
queries , word order matters . I f you're interes ted in
permuting jus t one query at a time, s ee [Hack #28] .
Hack 15. Repetition Matters
Repetition matters when it comes to keywords weighting your
queries.
U s ing keywords multiple times c an have an impac t on the types
and number of res ults you get.
D on't believe me? Try s earc hing for internet. A t the time of this
writing, the home page for M ic ros oft's I nternet E xplorer brows er
is the firs t res ult. N ow try s earc hing for internet internet. A t the
time of this writing, the I nternet Soc iety (I SO C ) pops to the top.
E xperiment with this us ing other words , putting additional query
words in if you want to. You'll s ee that multiple query words c an
have an impac t on how the s earc h res ults are ordered and in the
number of res ults returned.
1.27.1. How Does This Work?
G oogle does n't talk about this on their web s ite, s o this hac k is
the res ult of s ome c onjec ture and muc h experimentation.
Firs t, enter a word one time. L et's us e clothes as an example,
whic h returns 5 1 ,8 0 0 ,0 0 0 res ults , the top being a s ite c alled
T he E mperor's N ew C lothes . N ow, add another clothes to the
query (i.e., clothes clothes). T he number of res ults drops
dramatic ally to 1 4 ,7 0 0 ,0 0 0 , and the firs t res ult is for Traditional
Korean C lothing, T he E mperor's N ew C lothes is no longer in the
top 1 0 , and s ome new finds move their way up into the firs t few
res ults .
Why s top now? Try clothes clothes clothes. traditional Korean
C lothing s tays on top and T he E mperor's N ew C lothes is s till not
found in the top 1 0 .
1.27.2. A Theory
H ere's a theory: G oogle s earc hes for as many matc hes for eac h
word or phras e you s pec ify, s topping when it c an't find any more.
So clothes clothes returns pages with two oc c urrenc es of the
word "c lothes ." clothes clothes clothes returns the s ame res ults ,
bec aus e G oogle c an't do any better than two oc c urrenc es of
"c lothes " in any one page.
1.27.3. So What?
Bec aus e G oogle dis c ards nonmatc hing multiple ins tanc es of the
s ame query word, you c an us e this s earc h as a weighting s ys tem
for your s earc hes . For example, s ay you were interes ted in pipe
s ys tems for the gas indus try, but you're more interes ted in the
impac t that the pipe s ys tems were having on the gas indus try
(and les s s o in c ompanies that happen to s ell piping s ys tems for
the gas indus try).
Searc h for " pipe systems" gas. N ow query for " pipe systems" gas
gas. You'll notic e that the foc us of your res ults c hanges s lightly.
N ow try " pipe systems" pipe pipe gas gas. N ote how the foc us
s lants bac k the other way.
Bas ed on obs ervations , here are a few guidelines for us ing
multiple iterations of the s ame query term:
M ultiple iterations of produc t names or nouns s eem to
favor s hopping s ites . T his is es pec ially true if the name
or noun is plural (e.g., scooters).
J us t bec aus e you're not getting different res ults for the
s ec ond or third iteration does n't mean you won't get
different res ults for the fourth or fifth iteration (e.g.,
s uc c es s ive oc c urrenc es of baseball).
Remember that G oogle has a limit of 1 0 words per query,
s o relegate repetition to only thos e s ituations where you
c an s pare the query room [Hack #29] .
Hack 16. Search a Particular Date Range
A n undocumented but powerf ul f eature of Google's search is
the ability to search within a particular date range.
Before delving into the ac tual us e of date range s earc hing, there
are a few things you s hould unders tand. T he firs t is this : a date
range s earc h has nothing to do with the c reation date of the
c ontent and everything to do with the indexing date of the
c ontent. I f I c reate a page on M arc h 8 , 1 9 9 9 , and G oogle
does n't get around to indexing it until M ay 2 2 , 2 0 0 2 , for the
purpos es of a date range s earc h, the date in ques tion is M ay 2 2 ,
2002.
T he s ec ond thing is that G oogle c an index pages s everal times ,
and eac h time it does s o the date on it c hanges . So don't c ount
on a date range s earc h s taying c ons is tent from day to day. T he
daterange: times tamp c an c hange when a page is indexed more
than onc e. Whether it does c hange depends on whether the
c ontent of the page has c hanged.
T hird, G oogle does n't "s tand behind" the res ults of a s earc h
done us ing the date range s yntaxes . So if you get a weird res ult,
you c an't c omplain to them. G oogle would rather you us e the
date range options on their A dvanc ed Searc h page, but that
page allows you to res tric t your options only to the las t three
months , s ix months , or year.
1.28.1. The daterange: Syntax
Why would you want to s earc h by daterange:? T here are s everal
reas ons :
I t narrows down your s earc h res ults to fres her c ontent.
G oogle might find s ome obs c ure, out- of- the- way page
and index it only onc e. Two years later, this obs c ure,
never- updated page is s till turning up in your s earc h
res ults . L imiting your s earc h to a more rec ent date
range will res ult in only the mos t c urrent of matc hes .
I t helps you dodge c urrent events . Say J ohn D oe s ets a
world rec ord for eating hot dogs and immediately
afterward res c ues a baby from a burning building. L es s
than a week after that happens , G oogle's s earc h res ults
are going to be filled with J ohn D oe. I f you're s earc hing
for information on (another) J ohn D oe, babies , or burning
buildings , you'll s c arc ely be able to get rid of him.
H owever, you c an avoid M r. D oe's exploits by s etting the
date range s yntax to before the hot dog c ontes t. T his
als o works well for avoiding rec ent, heavily c overed
news events , s uc h as a c rime s pree or a fores t fire, and
annual events of at leas t national importanc e s uc h as
national elec tions or the O lympic s .
I t allows you to c ompare res ults over a period of time;
for example, if you want to s earc h for oc c urrenc es of
"M ac O S X" and "Windows XP " over a period of time. O f
c ours e, a c ount like this is n't foolproof; indexing dates
c hange over time. But generally it works well enough
that you c an s pot trends .
U s ing the daterange: s yntax is as s imple as :
daterange:startdate-enddate
T he c atc h is that the date mus t be expres s ed as a J ulian date
(read the s idebar, "U nders tanding J ulian D ates ") So, for
example, J uly 8 , 2 0 0 2 , is J ulian date 2 4 5 2 4 6 3 .5 and M ay 2 2 ,
1 9 6 8 , is 2 4 3 9 9 9 8 .5 . Furthermore, G oogle is n't fond of dec imals
in its daterange: queries ; us e only integers : 2 4 5 2 4 6 3 or
2 4 5 2 4 6 4 (depending on whether you prefer to round up or down)
in the previous example.
Understanding Julian Dates
While date- bas ed s earc hing is fantas tic ally us eful,
date- bas ed s earc hing with J ulian dates is annoying at
bes tfor a human, anyway.
A J ulian date is jus t one number. I t's not broken up into
month, day, and year. I t's the number of days that have
pas s ed s inc e J anuary 1 , 4 7 1 3 B.C . U nlike G regorian
days (thos e on the c alendar you and I us e every day),
whic h begin at midnight, J ulian days begin at noon,
making them us eful for as tronomers .
While problematic for humans , they're rather handy for
c omputer programming, bec aus e to c hange dates you
s imply have to add and s ubtrac t from one number and
not worry about month and year c hanges , not to mention
leap years and the differing number of days in eac h
month. G oogle's daterange: s pec ial s yntax element
employs J ulian dates .
I f things weren't c onfus ing enough, there is ac tually
another date format that is als o known as a J ulian date
format, a five- digit number, yyddd, where the firs t two
digits repres ent the mos t s ignific ant digits of the year
and the las t three repres ent the day of the year, where
the value is between 1 and 3 6 5 (or 3 6 6 in a leap year).
G oogle's daterange: s yntax does n't s upport the yyddd
format.
T here are plenty of plac es you c an c onvert J ulian dates
online. T here are a c ouple of nic e c onverters at the U .S.
N aval O bs ervatory A s tronomic al A pplic ations
D epartment
(http://aa.us no.navy.mil/data/doc s /J ulianD ate.html) and
M auro O rlandini's home page
(http://www.tes re.bo.c nr.it/~mauro/J D /), the latter
c onverting either J ulian to G regorian or vic e vers a.
M ore J ulian dates and online c omputers c an be found
via a G oogle s earc h for julian date
(http://www.google.c om/s earc h? q=julian+date).
You c an us e the daterange: s yntax with mos t other G oogle
s pec ial s yntaxes , with the exc eption of the link: s yntax, whic h
does n't mix well ["M ixing Syntaxes " earlier in this c hapter] with
other s pec ial s yntax and other magic words (e.g., stocks: and
phonebook:).
daterange: does wonders for narrowing your s earc h res ults . L et's
look at a c ouple of examples . G eri H alliwell left the Spic e G irls
around M ay 2 7 , 1 9 9 8 . I f you wanted to get a lot of information
about the breakup, you c ould try doing a date s earc h in a 1 0 - day
windows ay, M ay 2 5 to J une 4 . T hat query would look like this :
"Geri Halliwell" "Spice Girls" daterange:2450958-2450968
A t the time of this writing, you'll get about 1 6 res ults , inc luding
s everal news s tories about the breakup. I f you wanted to find
les s formal s ourc es , s earc h for Geri or Ginger Spice ins tead of
Geri Halliwell.
T hat example's a bit on the s illy s ide, but you get the idea. A ny
event that you c an c learly divide into before and after dates a n
event, a death, an overwhelming c hange in c irc ums tanc es c an be
reflec ted in a date range s earc h.
You c an als o us e an individual event's date to c hange the
res ults of a larger s earc h. For example, former I mC lone C E O
Sam Waks al was arres ted on J une 1 2 , 2 0 0 2 . You don't have to
s earc h for the name Sam Waks al to get a very narrow s et of
res ults for J une 1 3 , 2 0 0 2 :
imclone daterange:2452439-2452439
Similarly, if you s earc h for imclone before the date of 2452439,
you'll get very different res ults . A s an interes ting exerc is e, try a
s earc h that reflec ts the arres t, but date it a few days before the
ac tual arres t:
imclone investigated daterange:2452000-2452435
T his is a good way to find information or analys is that predates
the ac tual event, but that provides bac kground that might help
explain the event its elf. (U nles s you us e the date range s earc h,
us ually this kind of information is buried underneath news of the
event its elf.)
I f you'd prefer to perform G oogle date range s earc hes without all
this nons ens e about J ulian date formats , us e the FaganFinder
G oogle interfac e
(http://www.faganfinder.c om/engines /google.s html), an
alternative to the G oogle A dvanc ed Searc h page, s porting
daterange: s earc hing via a G regorian (read: familiar) date pull-
down menu. I n Figure 1 - 2 8 , we're us ing the FaganFinder on our
Spic e G irls breakup example.
Figure 1-28. The FaganFinder Google interface
with Gregorian-based date range searching
So that takes c are of date range s earc hing ac c ording to when
G oogle ran ac ros s the c ontent you're after, but what about
narrowing your s earc h res ults bas ed on c ontent c reation date?
1.28.2. Searching by Content Creation
Date
Searc hing for materials bas ed on c ontent c reation is diffic ult.
T here's no s tandard date format (s c ore one for J ulian dates ),
many people don't date their pages anyway, s ome pages don't
c ontain date information in their header, and s till other c ontent
management s ys tems routinely s tamp pages with today's date,
c onfus ing things s till further.
I c an offer few s ugges tions for s earc hing by c ontent c reation
date. Try adding a s tring of c ommon date formats to your query.
I f you wanted s omething from M ay 2 0 0 3 , for example, you c ould
try appending:
("May * 2003" | "May 2003" | 05/03 | 05/*/03)
A query like that us es up mos t of your 1 0 - word limit, however, s o
it's bes t to be judic ious , perhaps by c yc ling through thes e
formats one a time. I f any one of thes e is giving you too many
res ults , try res tric ting your s earc h to the title tag of the page.
I f you're feeling really luc ky, you c an s earc h for a full date, s uc h
as M ay 9 , 2 0 0 3 . Your dec is ion then is whether you want to
s earc h for the date in the format above or as one of many
variations : 9 May 2003, 9/5/2003, 9 May 03, and s o forth. E xac t-
date s earc hing will s everely limit your res ults and s hould be
us ed only as a las t- ditc h option.
When us ing date range s earc hing, you'll have to be flexible in
your thinking, more general in your s earc h than you otherwis e
would be (bec aus e the date range s earc h will narrow your res ults
s ubs tantially), and pers is tent in your queries , bec aus e different
dates and date ranges will yield very different res ults . T hat s aid,
you'll be rewarded with s maller res ult s ets foc us ed on very
s pec ific events and topic s in time.
Hack 17. Calculate Google Centuryshare
Determine the year in which Elvis achieved the height of his
f ame, over what period disco took hold of your nightlif e, and
when f uel economy actually mattered to anyone.
L ooking to pin down the year s omething big happened or watc h a
trend unfold gradually over time? FindForward
(http://www.findforward.c om)n ee the G oogle C enturys hare
C alc ulator (http://blog.outer- c ourt.c om/c enturys hare)e mploys
s ome of the s ame logic as the G oogle M inds hare C alc ulator
[Hack #34] to determine the weight of a s earc h query ac ros s a
5 0 - year period.
L et's s ay, for example, we s earc h for C hernobyl, s ite of a terrible
nuc lear power plant ac c ident in A pril 1 9 8 6 . E nter the s earc h
termi n this c as e, chernobyli n FindForward's s earc h box and
c hoos e a range of years from the pull- down menu to the right.
G iven that my c hoic es were 1 9 0 0 - 1 9 5 0 or 1 9 5 0 - 2 0 0 0 and the
fac t that I know I was alive when it happened, I c hos e 1 9 5 0 -
2 0 0 0 . C lic k Find and the engine will c hew on your query for a bit,
it's bac kend feeding a s teady s tream of queries to G oogle via
the G oogle A P I [Figure 1 - 2 9 c learly s hows that the Web knows a
little s omething about C hernobyl and the year 1 9 8 6 .
Figure 1-29. The Centuryshare Calculator
clearly shows something important happened at
Chernobyl in 1986
So, how does the C enturys hare algorithm work?
C enturys hare tries to find natural peaks for ideas in partic ular
years by s earc hing the Web via G oogle. For every year, the
number of the year is c ombined with the s earc h query: to find out
when E lvis P res ley was at the height of his fame, the engine
s earc hes for Elvis Presley 1950, Elvis Presley 1951, Elvis Presley
1952, and s o on, keeping trac k of the returned res ult c ount along
the way.
But a s imple c ount of res ults is n't quite enough. T here is an
additional trans formation of thes e numbers that needs to be
done in order for the res ult to be meaningful. M ention of various
years oc c ur in muc h larger quantity online: 1 9 0 0 , 1 9 1 0 , and
1 9 2 0 oc c ur more frequently, as do the years in the late part of
the twentieth c enturyt he boom of the Web.
So the C enturys hare c alc ulator als o gleans res ult c ount for eac h
year by its elf, without any additional s earc h query (i.e., G oogle
for 1950, 1951, 1952, etc .)
T hos e bas e numbers in hand, the engine then c alc ulates a
perc entage bas ed on the res ult count of year and s earch query
relative to year by its elf, without s earch query.
T hes e res ult c ount perc entages are normalized for dis play
purpos es and returned to you as a nic e bar graph of res ults by
year.
C ompare the C hernobyl res ults in Figure 1 - 2 9 with thos e for the
gentle ris e and fall of dis c o in Figure 1 - 3 0 .
Figure 1-30. The Centuryshare Calculator on the
bell (bottomed) curve of disco's reign
FindForward s ports a hos t of other s earc h features
(http://findforward.c om/about/), inc luding A mazon.c om, I RC
logs , weblogs , as s orted files people leave lying about on the
Web, people (famous and not), and things . For ins tanc e, you c an
as k a ques tion s uc h as "When was A lbert E ins tein born? " and
FindForward will trawl the Web, figure it out (or s omething c los e
enough for hors es hoes ), and provide a link to the s ourc e, as
s hown in Figure 1 - 3 1 ).
Figure 1-31. Ask a decent question...
C hec k the s ourc e out for yours elf by c lic king the "C hec k
s ourc e" link or find another by c lic king "Find new ans wer."
Philipp Lens s en
Hack 18. Hack Your Own Google Search
Form
Build your own personal, task-specif ic Google search f orm.
I f you want to do a s imple s earc h with G oogle, you don't need
anything but the s tandard Simple Searc h form (the G oogle home
page). But if you want to c raft s pec ific G oogle s earc hes that
you'll be us ing on a regular bas is or providing for others , you c an
s imply put together your own pers onalized s earc h form.
Start with your garden- variety G oogle s earc h form; s omething
like this will do nic ely:
<!-- Search Google -->
<form method="get" action="http://www.google.com/search">
<input type="text" name="q" size=31 maxlength=255 value="">
<input type="submit" name="sa" value="Search Google">
</form>
<!-- Search Google -->
T his is a very s imple s earc h form. I t takes your query and s ends
it direc tly to G oogle, adding nothing to it. But you c an embed
s ome variables to alter your s earc h as needed. You c an do this
in two ways : via hidden variables or by adding more input to your
form.
1.30.1. Hidden Variables
A s long as you know how to identify a s earc h option in G oogle,
you c an add it to your s earc h form via a hidden variable. T he fac t
that it's hidden jus t means that form us ers will not be able to
alter it. T hey won't even be able to s ee it unles s they take a look
at the s ourc e c ode. L et's take a look at a few examples .
While it's perfec tly legal H T M L to put your
hidden variables anywhere between the
opening and c los ing <form> tags , it's rather
tidy and us eful to keep them all together
after all the vis ible form fields .
File Type
A s the name s ugges ts , file type s pec ifies filtering your
res ults by a partic ular file type (e.g., Word .doc, A dobe
.pdf, P owerP oint .ppt, plain text .txt). A dd a P owerP oint
file type filter, for example, to your s earc h form, like s o:
<input type="hidden" name="as_filetype" value="PPT">
Site Search
N arrows your s earc h to s pec ific s ites . While a s uffix like
.com will work jus t fine, s omething more fine- grained like
the example.com domain is probably better s uited:
<input type="hidden" name="as_sitesearch" value="example.c
URL Component
Spec ifies a partic ular path c omponent to look for in
U RL s . T his c an inc lude a domain name but does n't have
to. T he following tries to teas e out doc umentation in
your res ult s et:
<input type="hidden" name="hq" value="inurl:docs">
Date Range
N arrows your s earc h to pages indexed within the s tated
number of months . A c c eptable values are between 1 and
1 2 . Res tric ting our res ults to items indexed only within
the las t s even months is jus t a matter of adding:
<input type="hidden" name="as_qdr" value="m7">
Number of Res ults
I ndic ates the number of res ults that you'd like appearing
on eac h page, s pec ified as a value of num between 1 and
1 0 0 ; the following as ks for 5 0 per page:
<input type="hidden" name="num" value="50">
What would you us e this for? I f you're regularly looking
for an eas y way to c reate a s earc h engine that finds
c ertain file types in a c ertain plac e, this works really
well. I f this is a one- time s earc h, you c an always jus t
hac k the res ults U RL ["U nders tanding G oogle U RL s "
earlier in this c hapter], tac king the variables and their
as s oc iated values on to the U RL of the res ults page.
1.30.2. Mixing Hidden File Types: An
Example
T he s ite tompeters .c om (http://www.tompeters .c om) c ontains
s everal P owerP oint (.ppt) files . I f you want to find jus t the
P owerP oint files on their s ite, you'd have to figure out how their
s ite s earc h engine works or pes ter them into adding a file type
s earc h option. But you c an put together your own s earc h form
that finds P owerP oint pres entations on the tompeters .c om s ite.
E ven though you're c reating a handy
s earc h form this way, you're s till res ting on
the as s umption that G oogle's indexed
mos t or all of the s ite that you're
s earc hing. U ntil you know otherwis e,
as s ume that any s earc h res ults G oogle
gives you are inc omplete.
Your form looks s omething like:
<!-- Search Google for tompeters.com PowerPoints -->
<form method="get" action="http://www.google.com/search">
<input type="text" name="q" size=31 maxlength=255 value="">
<input type="submit" name="sa" value="Search Google">
<input type="hidden" name="as_filetype" value="ppt">
<input type="hidden" name="as_sitesearch" value="tompeters.com">
<input type="hidden" name="num" value="100">
</form>
<!-- Search Google for tompeters.com PowerPoints -->
U s ing hidden variables is handy when you want to s earc h for one
partic ular thing all the time. But if you want to be flexible in what
you're s earc hing for, c reating an alternate form is the way to go.
1.30.3. Creating Your Own Google Form
Some variables bes t s tay hidden; however, for other options , you
c an let your form us ers be muc h more flexible.
L et's go bac k to the previous example. You want to let your
us ers s earc h for P owerP oint files , but you als o want them to be
able to s earc h for E xc el files and M ic ros oft Word files . I n
addition, you want them to be able to s earc h tompeters .c om, the
State of C alifornia, or the L ibrary of C ongres s . T here are
obvious ly various ways to do this us er- interfac e- wis e; this
example us es a c ouple of s imple pull- down menus :
<!-- Custom Google Search Form-->
<form method="get" action="http://www.google.com/search">
<input type="text" name="q" size=31 maxlength=255 value="">
<br />
Search for file type:
<select name="as_filetype">
<option value="ppt">PowerPoint</option>
<option value="xls">Excel</option>
<option value="doc">Word</option>
</select>
<br />
Search site:
<select name="as_sitesearch">
<option value="tompeters.com">TomPeters.com</option>
<option value="state.ca.us">State of California</option>
<option value="loc.gov">The Library of Congress</option>
</select>
<input type="hidden" name="num" value="100">
<input type="submit" value="Search Google">
</form>
<!-- Custom Google Search Form-->
FaganFinder (http://www.faganfinder.c om/engines /google.s html)
is a wonderful example of a thoroughly c us tomized form.
Hack 19. Go Beyond Google's Advanced
Search
Soople augments the f unctionality of Google A dvanced Search
with comprehensive yet easy-to-use f orms f or every Google
occasion.
G oogle may have s tarted as a s imple s earc h engine, but it got
pas t that a long time ago. N ow it offers c alc ulators , c onverters ,
number ranges , dic tionaries , s toc k s ymbol s earc h, and U P S
pac kage trac king. T hat wealth of offerings makes it a great
handy referenc e tool, but who c an remember all thos e s earc h
s yntaxes ?
Well, pour yours elf a nic e hot bowl of Soople
(http://www.s oople.c om) and perform advanc ed s earc hes with
eas e. Soople provides s everal dozen s earc h interfac es , eac h
geared toward a partic ular G oogle s earc h feature or property
(s ee Figure 1 - 3 2 ). By providing you with prefabric ated yet
flexible s pec ialized s earc h forms , Soople helps you c onc entrate
on your referenc e tas k and not on building s yntax.
Figure 1-32. Soople's dozens of interfaces,
aimed at just the Google search you're after
T he Soople main page opens with 1 3 different s pec ialty
interfac es , allowing you to filter for partic ular file types , break
out s earc hes by language, hunt down images , news , and
definitions , and more. A nd it does n't s top there. A t the top of the
page are five tabs : c alc ulators , trans lation tools , phone and
loc ation lookups , and a Superfilter. A ll of Soople's res ults appear
in regular G oogle res ults pages ; there's no us e of the Web A P I
here, jus t c us tom s earc h forms pointing at G oogle proper.
A s an example of jus t how detailed Soople gets , the c alc ulator
page is ac tually 1 0 different forms c overing different
mathematic al func tions from trigonometric func tions to
c onvers ion to finding perc entages and roots .
1.31.1. See Also
Soople takes building c us tom s earc h forms to an
extreme (and very us eful) c onc lus ion; s ee [Hack #18]
for more ins piration.
C hec k out [Hack #48] for s pec ialty forms with a narrow
foc us .
Hack 20. Use Google Tools for
Translators
Create a customized search f orm f or language translation.
I f you do a lot of the s ame kind of res earc h every day, you might
find that a c us tomized s earc h form makes your job eas ier. I f you
s pend enough time on it, you may find that it's elaborate enough
that other people may find it us eful as well.
WWW Searc h I nterfac es for Trans lators
(http://www.multilingual.c h/s earc h_interfac es .htm) offers four
amazing tools for finding material of us e to trans lators . C reated
by Tanya H arvey C iampi from Switzerland, the tools are
available in A ltaV is ta and G oogle flavors . A us er- defined query
term is c ombined with a s et of s pec ific s earc h c riteria to narrow
down the s earc h to yield highly relevant res ults .
T he firs t tool, s hown in Figure 1 - 3 3 , finds glos s aries . T he pull-
down menu finds s ynonyms of the word "glos s ary" in various
parts of a s earc h res ult (title, U RL , or anywhere). For example,
imagine having to s eek out numerous s pec ialized c omputer
dic tionaries before finding one c ontaining a definition of the term
"firewall." T his glos s ary s earc h tool s pares you the work by
s etting a c lear c ondition: "Find a glos s ary that c ontains my
term! "
Figure 1-33. Digging into Google's trove of
glossaries
I f you're getting too many res ults for the glos s ary word you
s earc hed for, try s earc hing for it in the title of the res ults
ins tead, for example, intitle:firewall rather than firewall.
T he s ec ond tool, s hown in Figure 1 - 3 4 , finds "parallel texts ,"
identic al pages in two or more languages , us eful for multilingual
terminology res earc h.
Figure 1-34. Matching parallel texts
Finding pages in two or more languages is not eas y; one of the
few plac es to do it eas ily is with C anadian government pages ,
whic h are available in Frenc h and E nglis h. T his tool provides
s everal different s earc h c ombinations between SL (s ourc e
language) and T L (target language).
T he firs t s et of s earc hes defaults to G oogle, though you c an
s earc h A ltaV is ta ins tead, if you prefer. I t provides s everal
language s ets (E nglis h- G erman, E nglis h- Spanis h, E nglis h-
Frenc h, etc .) and gives you options for s earc hing in eac h one (SL
in U RL , link to T L , page in T L c ountry, etc .).
T he s ec ond s et of s earc hes als o offers s everal language s ets
and s everal ways to s earc h them (three different ways to s earc h
for the s ourc e language in the U RL , keyword on the page in the
target language, etc .). I n s ome c as es , this tool als o lets you
s pec ify the c ountry for the target language (for example, Frenc h
c ould be a target language in C anada, Franc e, or Switzerland).
T he third tool, s hown in Figure 1 - 3 5 , finds variations on the word
"abbreviations " in the title or U RL of a s earc h res ult to find lis ts
of abbreviations .
Figure 1-35. Finding abbreviationsmake that
abbv., or is it abbrev.?
A fourth tool (Figure 1 - 3 6 ) s earc hes for idioms (I c an never
quite remember: is it "feed a c old, s tarve a fever" or "feed a
fever, s tarve a c old"? ), proverbs , and s lang.
Figure 1-36. Finding idioms, proverbs, and slang
T hes e s earc h tools are available in s everal languages and do a
lot of work for trans lators ; in fac t, they pull out s o muc h
information that you might think they'd require the G oogle A P I .
But they don't; the query is generated on the c lient s ide and
then pas s ed to G oogle.
I t's ac c omplis hed quite elegantly. Firs t, take a look at the
s ourc e c ode for the form and s ee if you notic e anything. H ere's a
hint: pay attention to the form element names . N otic e that this
hac k integrates s earc h s ynonyms without having to us e the
G oogle A P I or any kind of C G I . E verything's done via the form.
<!-- Initializing the form and opening a Google search
in a new window -->
<form method="GET" target="_blank"
action="http://www.google.com/search">
<!-- Taking the keyword search specified by the user -->
<input type="text" name="q" size="12">
<select name="q" size="1">
<!-- This is the cool stuff. These options provide several
different modifiers designed to catch glossaries
in Google. -->
<option selected value="intitle:dictionary OR intitle:glossary
OR intitle:lexicon OR intitle:definitions">
synonyms of "glossary" in TITLE - 1</option>
<option value="intitle:terminology OR intitle:vocabulary
OR intitle:definition OR intitle:jargon">
synonyms of "glossary" in TITLE - 2</option>
<option value="inurl:dictionary OR inurl:glossary OR inurl:lexicon
OR inurl:definitions">
synonyms of "glossary" in URL - 1</option>
<option value="inurl:terminology OR inurl:vocabulary
OR inurl:definition
OR inurl:jargon">synonyms of "glossary" in URL - 2</option>
<option value="inurl:dict OR inurl:gloss OR inurl:glos
OR inurl:dic">
abbreviations for "glossary" in URL</option>
<option value="dictionary OR glossary OR lexicon
OR definitions">synonyms of "glossary" ANYWHERE</option>
</select>
<!-- Ending the submission form. -->
<input type="submit" value="Find">
<input type="reset" value="Reset" name="B2">
</form>
T he magic at work here is to be found in the following two lines :
<input type="text" name="q" size="12">
<select name="q" size="1">
N otic e that both the query text field and glos s ary pop- up menu
are named the s ame thing: name="q". When the form is s ubmitted
to G oogle, the values of both fields are effec tively c ombined and
treated as one query. So entering a query of dentistry and
s elec ting synonyms of "glossary" in TITLE - 1 from the pop- up
menu res ult in a c ombined G oogle query of:
dentistry intitle:dictionary OR intitle:glossary OR intitle:lexico
1.32.1. Hacking the Hack
T his hac k us es c us tomized G oogle forms as an interfac e for
trans lators , but you c ould us e this idea for jus t about anything.
D o you need to find legal s tatutes ? Financ ial materials ?
I nformation from a partic ular vertic al market? A nything that has
its own s pec ialized voc abulary that you c an add to a form c an be
c hanneled into a hac k like this . What kind of interfac e would you
des ign?
Chapter 2. Advanced Web
H ac ks 2 1 - 4 9
Sec tion 2 .2 . A s s umptions
H ac k 2 1 . L ike a Vers ion
H ac k 2 2 . C apture G oogle Res ults in a G oogle Box
H ac k 2 3 . Build G oogle D irec tory U RL s
H ac k 2 4 . Find Rec ipes
H ac k 2 5 . Trac k Res ult C ounts over T ime
H ac k 2 6 . Feel Really L uc ky
H ac k 2 7 . G et Random Res ults (on P urpos e)
H ac k 2 8 . P ermute a Q uery
H ac k 2 9 . Weight a Q uery Keyword
H ac k 3 0 . Res tric t Searc hes to Top- L evel Res ults
H ac k 3 1 . Searc h for Spec ial C harac ters
H ac k 3 2 . D ig D eeper into Sites
H ac k 3 3 . Summarize Res ults by D omain
H ac k 3 4 . M eas ure G oogle M inds hare
H ac k 3 5 . SafeSearc h C ertify U RL s
H ac k 3 6 . Searc h G oogle Topic s
H ac k 3 7 . Find the L arges t P age
H ac k 3 8 . P erform P roximity Searc hes
H ac k 3 9 . M eander Your G oogle N eighborhood
H ac k 4 0 . Run a G oogle P opularity C ontes t
H ac k 4 1 . Sc rape Yahoo! Buzz for a G oogle Searc h
H ac k 4 2 . C ompare G oogle's Res ults with O ther Searc h
E ngines
H ac k 4 3 . Sc atters earc h with Yahoo! and G oogle
H ac k 4 4 . Yahoo! D irec tory M inds hare in G oogle
H ac k 4 5 . G lean Weblog- Free G oogle Res ults
H ac k 4 6 . Spot Trends with G eotargeting
H ac k 4 7 . Bring the G oogle C alc ulator to the C ommand
L ine
H ac k 4 8 . Build a C us tom D ate Range Searc h Form
H ac k 4 9 . Searc h Yes terday's I ndex
Hacks 21-49
I f you've jus t arrived from C hapter 1 and think that you have
more than enough information to G oogle yours elf s illy, hold on to
your hat. N ow you'll put into high gear all that you've learned
about the ins and outs of G oogling.
I n this c hapter you'll meander your G oogle neighborhood, range
farther ac ros s the Web, dig deeper into individual s ites , twis t and
rec ombine your queries , s queeze the las t drop of res ults out of
every s earc h, and even go beyond the bounds of G oogle's
indexa ll without wearing out your fingers .
Bec aus e you'll get your c omputer to do the lion's s hare of the
work for you.
T his c hapter hac ks G oogle programmatic ally. T hrough bite- s ized
programs , we'll introduc e you to the kind of trawling, c rawling,
and rec ombination that's pos s ible with jus t a few lines of c ode.
A nd it's all pos s ible thanks to s omething c alled the G oogle
A P I t hat's A pplic ation P rogramming I nterfac e, or G oogle for
c omputers .
I n A pril 2 0 0 2 , G oogle announc ed an alternate interfac e to the
friendly s earc h box you s ee on G oogle.c om. T hey opened up
their index to anyone with a little programming know- how and a
reas onable amount of patienc e. I nitially, this was n't muc h to
write home about. Some of the earlies t applic ations s imply
G oogled and inc orporated the res ults into a web pages o- c alled
G oogle boxes [Hack #22] . But as more people experimented
with the A P I , the variety of applic ations grew from the marginally
interes ting to the s erious ly us eful. A nd s o was born the book
that you're holding in your hands .
T his c hapter and the res t of this book c ontain hac ks that take
advantage of this alternate interfac e. Some s imply automate the
s orts of tas ks that might take you forever and a day to do by
hand. O thers run automatic ally to keep tabs on s earc hes a nd
res ults o f interes t to you. A nd s till others provide a bird's - eye
view of your res ults in c ontext, whic h is jus t not pos s ible by
eyeballing any number of res ults pages .
2.2. Assumptions
T hes e hac ks do as s ume a little more than an adventurous s pirit
and a res earc her's tenac ity. We as s ume that you already have
s ome programming bac kground or are willing to learn the bas ic s
as you go along. I n fac t, we've been happy to hear about s o
many readers pic king up and learning a little programming
through the hac ks in the previous edition of this book; learning
to program is s o muc h eas ier if you have a partic ular tas k in
mind.
You'll need to type in (or download) programs or s c ripts and run
them from the c ommand line (that's Terminal in M ac O S X or the
D O S c ommand window in Windows ). Some are run as C G I
s c ripts , bits of dynamic c ontent running on your web s ite and
talked to through your web brows er. For more information on
running hac ks on the c ommand line and as C G I s c ripts in your
brows er, s ee "H ow to Run the H ac ks " in the P refac e.
A lmos t all of the hac ks are written in P erl (http://www.perl.c om),
with a few P ython (http://www.python.org), P H P
(http://www.php.net), J ava (http://java.s un.c om), and .N E T
(http://www.mic ros oft.c om/net) programs s prinkled throughout.
To run a partic ular hac k, you'll need the appropriate language to
be available on your c omputer. Sinc e ins truc tion on ins talling
and us ing thes e languages is beyond the s c ope of this book, you
s hould s tart with a vis it to the language's home page and might
c ons ider pic king up a c opy of one of O 'Reilly's fine s elec tion of
books (http://www.oreilly.c om). Learning Perl by Randal L .
Sc hwartz and Tom P hoenix (O 'Reilly) will be partic ularly us eful.
M os t of the hac ks us e the G oogle A P I . For an introduc tion to the
programmatic s ide of G oogle, a detailed walkthrough of the
G oogle A P I , and examples of programming G oogle us ing P erl,
P ython, P H P, J ava, and .N E T, turn to C hapter 9 .
T here are als o a few hac ks that involve s pidering or s creen
s crapingwhic h is es s entially us ing your program to read a s ite's
web pages and extrac t s alient informationt o get to data that are
either not available through the G oogle A P I or on another s ite
entirely. I f s pidering appeals to you, you might als o c hec k out
Spidering Hacks by Kevin H emenway and Tara C alis hain
(O 'Reilly).
Hack 21. Like a Version
Gather a list of what Google thinks are synonyms f or a keyword
you provide.
T he G oogle ~ s ynonym operator ["Spec ial Syntax" in C hapter 1 ]
widens your s earc h c riteria to inc lude not only the s pec ific
keywords in your s earc h, but als o words G oogle has found to be
s ynonyms of, or at leas t in s ome way related to, your query
words . So while, for example, food facts may only matc h a
handful of pages of interes t to you, ~food ~facts s eeks out
nutrition information, c ooking trivia, and more. A nd finding thes e
s ynonyms is an entertaining and potentially us eful exerc is e in
and of its elf. H ere's one way...
L et's s ay we're looking for all the s ynonyms for the word "c ar."
Firs t, we s earc h G oogle for ~car to find all the pages that c ontain
a s ynonym for "c ar" I n its s earc h res ults , G oogle highlights
s ynonyms in bold, jus t as it highlights regular keyword matc hes .
Sc anning the res ults (the s ec ond page is s hown in Figure 2 - 1 )
for ~car finds c ar, c ars , motor, auto, BM W, and other s ynonyms in
boldfac e.
Figure 2-1. ~car turns up bolded synonyms in
Google search results
N ow let's foc us on the s ynonyms rather than our original
keyword, "c ar." We'll do s o by exc luding the word "c ar" from our
query, like s o: ~car -car. T his s aves us from having to wade
through page after page of matc hes for the word "c ar."
O nc e again, we s c an the s earc h res ults for new s ynonyms . (I ran
ac ros s automotive, rac ing, vehic le, and motor.)
M ake a note of any new bolded s ynonyms and s ubtrac t them
from the query (e.g., ~car -car -automotive -racing -vehicle -
motor) until you hit G oogle's 1 0 - word limit ["T he 1 0 - Word L imit"
in C hapter 1 ], after whic h G oogle s tarts ignoring any additional
words that you tac k on.
I n the end, you'll have c ompiled a goodly lis t of s ynonyms , s ome
of whic h you'd not have found in your typic al thes aurus thanks to
G oogle's algorithmic approac h to s ynonyms .
2.3.1. The Code
I f you think this all s ounds a little tedious and more in the job
des c ription of a c omputer program, you'd be right. H ere's a s hort
P ython s c ript to do all the iteration for you. I t takes in a s tarting
word and s pits out a lis t of s ynonyms that it ac c rues along the
way.
You'll need the P yG oogle [Hack #98]
library to provide an interfac e to the
G oogle A P I .
#!/usr/bin/python
# Available at http://www.aaronsw.com/2002/synonyms.py
import re
import google # get at http://pygoogle.sourceforge.net/
sb = re.compile('<b>(.*?)</b>', re.DOTALL)
def stripBolds(text, syns):
for t in sb.findall(text):
t = t.lower( ).encode('utf-8')
if t != '...' and t not in syns: syns.append(t)
return syns
def findSynonyms(q):
if ' ' in q: raise ValueError, "query must be one word"
query = "~" + q
syns = []
while (len(query.split(' ')) <= 10):
for result in google.doGoogleSearch(query).results:
syns = stripBolds(result.snippet, syns)
added = False
for syn in syns:
if syn in query: continue
query += " -" + syn
added = True
break
if not added: break # nothing left
return syns
if __name__ == "__main_ _":
import sys
if len(sys.argv) != 2:
print "Usage: python " + sys.argv[0] + " query"
else:
print findSynonyms(sys.argv[1])
Save the c ode as s ynonyms .py.
2.3.2. Running the Hack
C all the s c ript on the c ommand line ["H ow to Run the H ac ks " in
the P refac e], pas s ing it a s tarting word to get it going, like s o:
% python synonyms.py car
2.3.3. The Results
You'll get bac k a lis t of s ynonyms like thes e:
['auto', 'cars', 'car', 'vehicle', 'automotive', 'bmw', 'motor', '
'toyota']
Aaron Swartz
Hack 22. Capture Google Results in a
Google Box
A dd a little box of Google results to any page in your web site.
A Google box is a s mall H T M L s nippet that s hows G oogle s earc h
res ults for whatever you're s earc hing for. You might wis h to
dis play on your web page a box of pages s imilar to yours , pages
that link to yours , or the top hits for a s earc h that might be of
interes t to your readers .
G oogle boxes as a c onc eptt he idea of taking a s hortened vers ion
of G oogle res ults and integrating them into a web page or s ome
other plac ea re not new. I n fac t, they're on their way to bec oming
ubiquitous when it c omes to weblog and c ontent management
s oftware. T he G oogle box is eas y to implement and was one of
the firs t examples of G oogle A P I us age. A s s uc h, it enjoys the
pos ition of proto-application: a lot of developers whip up a G oogle
box jus t to s ee if they c an. D o a G oogle s earc h for Google Box to
s ee s ome other examples of G oogle boxes for different
languages and applic ations .
What goes in a G oogle box, anyway? Why would anybody want to
integrate them into a web page?
I t depends on the page. P utting a G oogle box that s earc hes for
your name onto a weblog provides a bit of an ego boos t and c an
give a little more information about you without s eeming like
bragging (yeah, right). I f you have a topic - s pec ific page, s et up a
G oogle box that s earc hes for the topic (the more s pec ific , the
better the res ults ). A nd if you've got a general news - type page,
c ons ider adding a G oogle box for the news topic . G oogle boxes
c an go pretty muc h anywhere, with G oogle updating its index
often enough that the c ontent of a G oogle box s tays fres h.
2.4.1. The Code
H ere's a c las s ic piec e of P erl c ode to produc e a G oogle box as a
regular text file filled with garden- variety H T M L c ode, s uitable
for inc orporating into any web page.
#!/usr/local/bin/perl
# google_box.pl
# A classic Google box implementation.
# Usage: perl google_box.pl <query> <# results>
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
# Bring in those command-line arguments.
@ARGV == 2
or die "Usage: perl googlebox.pl <query> <# results>\n";
my($query, $maxResults) = @ARGV;
$maxResults = 10 if ($maxResults < 1 or $maxResults > 10);
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Query Google.
my $results = $google_search ->
doGoogleSearch(
$google_key, $query, 0, $maxResults, "false", "",
"false", "", "latin1", "latin1"
);
# No results?
@{$results->{resultElements}} or die "no results";
print join "\n",
map( {
qq{<a href="$_->{URL}">} .
($_->{title} || $_->{URL}) .
qq{</a> <br />}
} @{$results->{resultElements}} );
Save the c ode to a file c alled google_box.pl. Be s ure to replac e
insert key here in the s eventh line with your pers onal G oogle A P I
key.
2.4.2. Running the Hack
T his G oogle box takes two bits of information on the c ommand
line ["H ow to Run the H ac ks " in the P refac e]: the query you want
to run and maximum number of res ults you'd prefer (up to 1 0 ). I f
you don't provide the number of res ults , the G oogle box will
default to 1 0 . Run it as follows :
% perl google_box.pl " query " # of results
where query is the s earc h query you'd like to run agains t G oogle
and # of results is the maximum number of res ults you want it to
return.
T his will print the res ults to the s c reen. To s ave them to a text
file for inc lus ion in your web pages , s pec ify the name of a file to
s ave the res ults to, like s o:
% perl google_box.pl " query " # of results > google_box.html
You c an leave out s pec ifying # of results
and the s c ript will default to 1 0 res ults in
your G oogle box.
2.4.3. The Results
H ere's a s ample G oogle box for " camel book", referring to
O 'Reilly's popular Programming Perl title:
<a href="http://www.oreilly.com/catalog/pperl2/">oreilly.com --
Online Catalog:Programming Perl, 2nd Edition</a> <br />
<a href="http://www.oreilly.com/catalog/pperl3/">oreilly.com --
Online Catalog:Programming Perl, 3rd Edition</a> <br />
<a href="http://www.oreilly.com/catalog/pperl2/noframes.html">Prog
Perl, 2nd Edition</a> <br />
<a href="http://www.tuxedo.org/~esr/jargon/html/entry/Camel-Book.h
<a href="http://www.cise.ufl.edu/perl/camel.html">The Camel Book<a
2.4.4. Integrating a Google Box
When you inc orporate a G oogle box into your web page, you'll
have two c ons iderations : refres hing the c ontent of the box
regularly and integrating the c ontent into your web page. For
refres hing the c ontent of the box, you'll need to run regularly the
program us ing s omething like cron under U nix or the Windows
Sc heduler.
To inc lude the c ontent on your web page, Server Side I nc ludes
(SSI ) is always rather effec tive. With SSI , inc luding a G oogle
box takes little more than s omething like this :
<!-- #include virtual="./google_box.html" -->
For more information on us ing Server Side I nc ludes , c
out the N C SA SSI Tutorial
(http://hoohoo.nc s a.uiuc .edu/doc s /tutorials /inc ludes .h
or s earc h G oogle for Server Side Includes Tutorial.
G oogle boxes are a nic e addition to your web pages , whether you
run a weblog or a news s ite. But for many G oogle box s earc hes ,
the s earc h res ults won't c hange that often, es pec ially for more
c ommon s earc h words .
2.4.5. Making the Google Box Timely
A s you might remember, G oogle has a daterange: s earc h s yntax
available. T his vers ion of the G oogle box takes advantage of the
daterange: Hack #16] s yntax, allowing you to s pec ify how many
days bac k you want your query to run. I f you don't provide a
number, the default is 1 , and there's no maximum. I wouldn't go
bac k muc h further than a month or s o. T he fewer days bac k you
go, the more often the res ults in the G oogle box will c hange.
You'll need the Julian::Day module to get
this hac k rolling
(http://s earc h.c pan.org/s earc h?
query=time% 3 A % 3 A julianday).
2.4.5.1 The code
T he c ode is es s entially identic al to that of the c las s ic G oogle
box, s ave the additional bits to ac c ept and deal with a date
range on the c ommand line and build a daterange: query, c alled
out in bold:
#!/usr/local/bin/perl
# timebox.pl
# A time-specific Google box.
# Usage: perl timebox.pl <query> <# results> <# days back>
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
use Time::JulianDay;
# Bring in those command-line arguments.
@ARGV == 2
or die "Usage: perl timebox.pl <query> <# results> <# days back>
my($query, $maxResults, $daysBack) = @ARGV;
$maxResults = 10 if ($maxResults < 1 or $maxResults > 10);
$daysBack = 1 if $daysBack <= 0;
# Figure out when yesterday was in Julian days
my $yesterday = int local_julian_day(time) - $daysBack;
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Query Google.
my $results = $google_search ->
doGoogleSearch(
$google_key, "$query daterange:$yesterday-$yesterday", 0,
$maxResults, "false", "", "false", "", "latin1", "latin1"
);
# No results?
@{$results->{resultElements}} or die "no results";
print join "\n",
map( {
qq{<a href="$_->{URL}">} .
($_->{title} || $_->{URL}) .
qq{</a> <br />}
} @{$results->{resultElements}} );
Save the c ode to a text file named timebox.pl. A nd, again, don't
forget to replac e insert key here with your G oogle A P I key.
2.4.5.2 Running the hack
You'll have to provide three bits of information on the c ommand
line: the query you want to run, the maximum number of res ults
you'd prefer (up to 1 0 ), and the number of days bac k that G oogle
s hould c ons ider:
% perl timebox.pl "query" # of results # days back
Replac e query with your s earc h query, # of results with the
number of res ults you'd like (up to 1 0 ), and # days back with the
number of days bac k you'd like to s earc h for res ults .
A gain, to s end the res ults to a text file rather than the s c reen,
c all the s c ript like this :
% perl timebox.pl "query" # of results # days back > google_box.h
You c an leave out s pec ifying # of results
and # days back and the s c ript will default
to 1 0 res ults and one day bac k,
res pec tively.
2.4.5.3 The results
H ere's a s ample G oogle box for the top five " google hacks"
res ults (this book inc luded, hopefully), indexed the day before
the time of this writing:
% perl timebox.pl "google hacks" 5 1
<a href="http://isbn.nu/0596004478">Google Hacks</a> <br />
<a href="http://isbn.nu/0596004478/shipsort">Google Hacks</a> <br
<a href="http://isbn.nu/0596004478/amazonca">Amazon.ca: Google Hac
<a href="http://www.oreilly.de/catalog/googlehks/">Google Hacks</a
<a href="http://www.oreilly.de/catalog/googlehks/author.html">Goog
2.4.5.4 Hacking the hack
P erhaps you'd like your G oogle box to reflec t "this day in 1 9 9 9 ."
N o problem for this s lightly tweaked vers ion of the T imely
G oogle box (c hanges highlighted in bold):
#!/usr/local/bin/perl
# timebox_thisday.pl
# A Google box for this day in <year>
# Usage: perl timebox.pl <query> <# results> [year]
# Your Google API developer's key.
my $google_key='insert key here ';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
use Time::JulianDay;
my @now = localtime(time);
# Bring in those command-line arguments.
@ARGV == 2
or die "Usage: perl timebox.pl <query> <# results> [year]\n";
my($query, $maxResults, $year) = @ARGV;
$maxResults = 10 if ($maxResults < 1 or $maxResults > 10);
$year =~ /^\d{4}$/ or $year = 1999;
# Figure out when this day in the specified year is.
my $then = int julian_day($year, $now[4], $now[3]);
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Query Google.
my $results = $google_search ->
doGoogleSearch(
$google_key, "$query daterange:
$then-$then
", 0,
$maxResults, "false", "", "false", "", "latin1", "latin1"
);
# No results?
@{$results->{resultElements}} or die "no results";
print join "\n",
"$query on this day in $year<p />",
map( {
qq{<a href="$_->{URL}">} .
($_->{title} || $_->{URL}) .
qq{</a> <br />}
} @{$results->{resultElements}} );
2.4.5.5 The results
T he hac ked vers ion of timely G oogle box runs jus t like the firs t
vers ion, exc ept that you s pec ify the maximum number of res ults
and a year. G oing bac k further than 1 9 9 9 does n't yield
partic ularly us eful res ults given that G oogle c ame online in
1998.
L et's take a peek at how N ets c ape was doing in 1 9 9 9 :
% perl timebox_thisday.pl "netscape" 5 1999
netscape
on this day in 1999:<p />
<a href="http://www.showgate.com/aol.html">WINSOCK.DLL and NETSCAP
AOL Members</a> <br />
<a href="http://www.univie.ac.at/comment/99-3/993_23.orig.html">Co
- Netscape Communicator</a> <br />
<a href="http://www.ac-nancy-metz.fr/services/docint/netscape.htm"
</a> <br />
<a href="http://www.ac-nancy-metz.fr/services/docint/Messeng1.htm"
Courrier électronique avec Netscape Messenger</a> <br />
<a href="http://www.airnews.net/anews_ns.htm">Setting up Netscape
Airnews Proxy News</a> <br />
Hack 23. Build Google Directory URLs
Use ODP category inf ormation to build URLs f or the Google
Directory.
T he G oogle D irec tory (http://direc tory.google.c om) overlays the
O pen D irec tory P rojec t (O D P or D M O Z, http://www.dmoz.org)
ontology onto the G oogle c ore index. T he res ult is a Yahoo! - like
direc tory hierarc hy of s earc h res ults and their as s oc iated
c ategories with the added magic of G oogle's popularity
algorithms .
T he O D P opens its entire databas e of lis tings to
anybodyp rovided you're willing to download a 2 8 3 M B file (and
that's c ompres s ed! ). While you're probably not interes ted in all
the individual lis tings , you might want partic ular O D P
c ategories , or you may be interes ted in watc hing new lis tings
flowing into c ertain c ategories .
U nfortunately, the O D P does not offer a way to s earc h by
keyword s ites added within a rec ent time period. So ins tead of
s earc hing for rec ently added s ites , the bes t way to get new s ite
information from the O D P is to monitor c ategories .
Bec aus e the G oogle D irec tory builds its direc tory bas ed on the
O D P information, you c an us e the O D P c ategory hierarc hy
information to generate G oogle D irec tory U RL s . T his hac k
s earc hes the O D P c ategory hierarc hy information for keywords
that you s pec ify, and then builds G oogle D irec tory U RL s and
c hec ks to make s ure that they're ac tive.
You'll need to download the c ategory hierarc hy information from
the O D P to get this hac k to work. T he c ompres s ed file
c ontaining this information is available from
http://dmoz.org/rdf.html, and the s pec ific file is here:
http://dmoz.org/rdf/s truc ture.rdf.u8 .gz. Before us ing it, you mus t
unc ompres s it with a dec ompres s ion applic ation s pec ific to your
operating s ys tem. I n the U nix environment, the c ommand looks
s omething like this :
% gunzip structure.rdf.u8.gz
Bear in mind that the full c ategory
hierarc hy is over 3 5 M B. I f you jus t want
to experiment with the s truc ture, you c an
get an exc erpt from
http://dmoz.org/rdf/s truc ture.example.txt.
T his vers ion is a plain text file and does
not require unc ompres s ing.
2.5.1. The Code
Save the following c ode to a text file c alled google_dir.pl:
#!/usr/bin/perl
# google_dir.pl
# Uses ODP category information to build URLs into the Google Dire
# Usage: perl google_dir.pl "keywords" < structure.rdf.u8
use strict;
use LWP::Simple;
# Turn off output buffering.
$|++;
my $directory_url = "http://directory.google.com";
@ARGV == 1
or die qq{usage: perl google_dir.pl "{query}" < structure.rdf.u8
# Grab those command-line specified keywords and build a regular e
my $keywords = shift @ARGV;
$keywords =~ s!\s+!\|!g;
# A place to store topics.
my %topics;
# Loop through the DMOZ category file, printing matching results.
while (<>) {
/"(Top\/.*$keywords.*)"/i and !$topics{$1}++
and print "$directory_url/$1\n";
2.5.2. Running the Hack
Run the s c ript from the c ommand line ["H ow to Run the H ac ks "
in the P refac e], along with a query and the piped- in c ontents of
the D M O Z c ategory file:
% perl googledir.pl "keywords" < structure.rdf.u8
Replac e keywords with the partic ular keywords that you're after.
I f you're us ing the s horter c ategory exc erpt
s tructure.example.txt, us e this :
% perl googledir.pl "keywords" < structure.example.txt
2.5.3. The Results
Feeding the keyword mosaic into this hac k would look s omething
like this :
% perl googledir.pl "mosaic" < structure.rdf.u8
http://directory.google.com/Top/Arts/Crafts/Mosaics
http://directory.google.com/Top/Arts/Crafts/Mosaics/Glass
http://directory.google.com/Top/Arts/Crafts/Mosaics/Ceramic_and_Br
http://directory.google.com/Top/Arts/Crafts/Mosaics/Associations_a
http://directory.google.com/Top/Arts/Crafts/Mosaics/Stone
http://directory.google.com/Top/Shopping/Crafts/Mosaics
http://directory.google.com/Top/Shopping/Crafts/Supplies/Mosaics
...
2.5.4. Hacking the Hack
T here is n't muc h hac king that you c an do to this hac k; it's
des igned to take O D P data, c reate G oogle U RL s , and verify
thos e U RL s . H ow well you c an get this to work for you really
depends on the types of s earc h words that you c hoos e.
C hoos e words that are more general. I f you're interes ted in a
partic ular s tate in the U .S., for example, c hoos e the name of the
s tate and major c ities , but don't c hoos e the name of a very s mall
town or of the governor. C hoos e the name of a c ompany but not
of its C FO . A good rule of thumb is to c hoos e the keywords that
you might find as entry names in an enc yc lopedia or almanac .
You c an eas ily imagine finding a c ompany name as an
enc yc lopedia entry, but it's a rare C FO who c ould ac hieve the
s ame.
Hack 24. Find Recipes
Let the Google A PI transf orm those random ingredients in your
f ridge into a wonderf ul dinner.
G oogle c an help you find news , c atalogs , dis c us s ions , web
pages , and s o muc h morea nd it c an als o help you figure out what
to have for dinner tonight!
T his hac k us es the G oogle A P I to help you trans form thos e
random ingredients in your fridge into a wonderful dinner. Well,
you do have to do s ome of the work. But it all s tarts with this
hac k.
2.6.1. The Code
T his hac k c omes with a built- in form that c alls the query and the
rec ipe type, s o there's no need to s et up a s eparate form:
#!/usr/local/bin/perl
# goocook.cgi
# Finding recipes with google.
# goocook.cgi is called as a CGI with form input.
# Your Google API developer's key
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use SOAP::Lite;
use CGI qw/:standard/;
my %recipe_types = (
"General" => "site:allrecipes.com | site:cooking.com |
epicurious.com | site:recipesource.com",
"Vegetarian/Vegan" => "site:fatfree.com | inurl:veganmania | in
vegetarianrecipe | inurl:veggiefiles",
"Wordwide Cuisine" => "site:Britannia.org | inurl:thegutsygourme
inurl:simpleinternet | inurl:soupsong"
);
print
header( ),
start_html("GooCook"),
h1("GooCook"),
start_form(-method=>'GET'),
'Ingredients: ', textfield(-name=>'ingredients'),
br( ),
'Recipe Type: ', popup_menu(-name=>'recipe_type',
-values=>[keys %recipe_types], -default=>'General'),
br( ),
submit(-name=>'submit', -value=>"Get Cookin'!"),
submit(-name=>'reset', -value=>"Start Over"),
end_form( ), p( );
if (param('ingredients')) {
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my $results = $google_search ->
doGoogleSearch(
$google_key,
param('ingredients') . " " . $recipe_types{param('recipe_typ
0, 10, "false", "", "false", "", "latin1", "latin1"
);
@{$results->{'resultElements'}} or print "None";
foreach (@{$results->{'resultElements'}}) {
print p(
b($_->{title}||'no title'), br( ),
a({href=>$_->{URL}), br( ),
i($_->{snippet}||'no snippet')
);
print end_html( );
Save the c ode as a C G I s c ript ["H ow to Run the Sc ripts " in the
P refac e] named goocook.cgi in your web s ite's cgi-bin direc tory.
2.6.2. Running the Hack
T his hac k runs as a C G I s c ript, produc ing a dynamic web page
alongs ide the res t of the pages in your web s ite. Sinc e jus t where
you plac e and how you run C G I s c ripts varies from s erver to
s erver and I SP to I SP, you're bes t left to as k your adminis trator
or provider for help.
O nc e the s c ript is in plac e, c all it by pointing your Web brows er
at goocook.cgi, fill in the ingredients you have on hand, s elec t a
rec ipe type, and hit the "G et C ookin'! " button.
2.6.3. Hacking the Hack
O f c ours e, the mos t obvious way to hac k this hac k is to add new
rec ipe options to it. T hat involves firs t finding new rec ipe s ites ,
and then adding them to the hac k.
A dding new rec ipe s ites entails finding the domains that you
want to s earc h. U s e the c ooking s ec tion of the G oogle D irec tory
to find rec ipes , s tarting here:
http://direc tory.google.c om/Top/H ome/C ooking/Rec ipe_C ollec tion
N ext, find what you want and build it into a query s upplement like
the one in the form, s urrounded by parenthes es with eac h item
s eparated by a | . Remember, us ing the site: s yntax means that
you'll be s earc hing for an entire domain, s o if you find a great
rec ipe s ite at
http://www.geoc ities .c om/reallygreat/food/rec ipes /, don't us e
the site: s yntax to s earc h it; us e the inurl: s earc h ins tead
(inurl:geocities.com/reallygreat/food/recipes). J us t remember
that an addition like this c ounts heavily agains t your 1 0 - word
query limit.
L et's look at an example. T he c ookbook s ec tion of the G oogle
D irec tory has a s eafood s ec tion with s everal s ites . L et's pull out
five s ites and turn them into a c ons traint on our query:
(site:simplyseafood.com | site:baycooking.com | site:coastangler.c
welovefish.com | site:sea-ex.com)
N ext, tes t the query c ons traints live in G oogle by adding a query
(in this c as e, salmon) and running it as a s earc h:
salmon (site:simplyseafood.com | site:baycooking.com | site:coasta
| site:welovefish.com | site:sea-ex.com)
Run a few different queries with a few different query words
(salmon, scallops, whatever) and make s ure that you're getting a
dec ent number of res ults . O nc e you're c onfident that you have a
good s elec tion of rec ipes , you'll need to add this new option to
the hac k:
my %recipe_types = (
"General" => "site:allrecipes.com | site:cooking.com |
epicurious.com | site:recipesource.com",
"Vegetarian/Vegan" => "site:fatfree.com | inurl:veganmania | in
vegetarianrecipe | inurl:veggiefiles",
"Wordwide Cuisine" => "site:Britannia.org | inurl:thegutsygourme
inurl:simpleinternet | inurl:soupsong"
);
Simply add the name you want to c all the option (a =>) and the
s earc h s tring. M ake s ure you add it before the c los ing
parenthes is and s emic olon. Your c ode s hould look s omething
like the c ode s hown next.
my %recipe_types = (
"General" => "site:allrecipes.com | site:cooking.com |
epicurious.com | site:recipesource.com",
"Vegetarian/Vegan" => "site:fatfree.com | inurl:veganmania | in
vegetarianrecipe | inurl:veggiefiles",
"Wordwide Cuisine" => "site:Britannia.org | inurl:thegutsygourme
inurl:simpleinternet | inurl:soupsong"
"Seafood" => "site:simplyseafood.com | site:baycooking.com | si
coastangler.com | site:welovefish.com | site:sea-ex.com"
);
You c an add as many s earc h s ets to the hac k as you want. You
may want to add C hines e C ooking, D es s erts , Soups , Salads , or
any number of other options .
Tara Calis hain and Judy Hourihan
Hack 25. Track Result Counts over Time
Query Google f or each day of a specif ied date range, counting
the number of results at each time index.
Sometimes the res ults of a s earc h aren't of as muc h interes t as
knowing the number thereof. H ow popular is a partic ular
keyword? H ow many times is s o- and- s o mentioned? H ow do
differing phras es or s pellings s tac k up agains t eac h other?
You may als o wis h to trac k the popularity of a term over time to
watc h its ups and downs , s pot trends , and notic e tipping points .
C ombining the G oogle A P I and daterange: [Hack #16] s yntax is
jus t the tic ket.
T his hac k queries G oogle for eac h day over a s pec ified date
range, c ounting the number of res ults for eac h day. T his leads to
a lis t of numbers that you c ould enter into E xc el and c hart, for
example.
T here are a c ouple of c aveats before diving right into the c ode.
Firs t, the average keyword will tend to s how more res ults over
time as G oogle ads more pages to its index. Sec ond, G oogle
does n't s tand behind its date range s earc h; res ults s houldn't be
taken as gos pel.
T his hac k requires the Time::JulianDay
(http://s earc h.c pan.org/s earc h?
query=T ime% 3 A % 3 A J ulianD ay) P erl
module.
2.7.1. The Code
Save the following c ode as a file named goocount.pl:
#!/usr/local/bin/perl
# goocount.pl
# Runs the specified query for every day between the specified
# start and end dates, returning date and count as CSV.
# Usage: goocount.pl query="{query}" start={date} end={date}\n}
# where dates are of the format: yyyy-mm-dd, e.g. 2002-12-31
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use SOAP::Lite;
use Time::JulianDay;
use CGI qw/:standard/;
# For checking date validity.
my $date_regex = '(\d{4})-(\d{1,2})-(\d{1,2})';
# Make sure all arguments are passed correctly.
( param('query') and param('start') =~ /^(?:$date_regex)?$/
and param('end') =~ /^(?:$date_regex)?$/ ) or
die qq{usage: goocount.pl query="{query}" start={date} end={date
# Julian date manipulation.
my $query = param('query');
my $yesterday_julian = int local_julian_day(time) - 1;
my $start_julian = (param('start') =~ /$date_regex/)
? julian_day($1,$2,$3) : $yesterday_julian;
my $end_julian = (param('end') =~ /$date_regex/)
? julian_day($1,$2,$3) : $yesterday_julian;
# Create a new Google SOAP request.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
print qq{"date","count"\n};
# Iterate over each of the Julian dates for your query.
foreach my $julian ($start_julian..$end_julian) {
$full_query = "$query daterange:$julian-$julian";
# Query Google
my $result = $google_search ->
doGoogleSearch(
$google_key, $full_query, 0, 10, "false", "", "false",
"", "latin1", "latin1"
);
# Output
print
'"',
sprintf("%04d-%02d-%02d", inverse_julian_day($julian)),
qq{","$result->{estimatedTotalResultsCount}"\n};
Be s ure to replac e insert key here with your G oogle A P I key.
2.7.2. Running the Hack
Run the s c ript from the c ommand line ["H ow to Run the H ac ks "
in the P refac e], s pec ifying a query, s tart, and end dates .
P erhaps you'd like to s ee trac k mentions of the lates t M ac intos h
operating s ys tem (c ode name "P anther") leading up to, on, and
after its launc h (O c tober 2 4 , 2 0 0 3 ). T he following invoc ation
s ends its res ults to a c omma- s eparated (C SV ) file for eas y
import into E xc el or a databas e:
% perl goocount.pl query="OS X Panther" \
start=2003-10-20 end=2003-10-28 > count.csv
L eaving off the > and C SV filename s ends the res ults to the
s c reen for your perus al:
% perl goocount.pl query="OS X Panther" \
start=2003-10-20 end=2003-10-28
I f you want to trac k res ults over time, you c ould run the s c ript
every day (us ing cron under U nix or the s c heduler under
Windows ), with no date s pec ified, to get the information for that
day's date. J us t us e >> filename.csv to append to the filename
ins tead of writing over it. O r you c ould get the res ults emailed to
you for your daily reading pleas ure.
2.7.3. The Results
H ere's that s earc h for P anther, the new M ac intos h operating
s ys tem:
% perl goocount.pl query="OS X Panther" \
start=2003-10-20 end=2003-10-28
"date","count"
"2003-10-20","28"
"2003-10-21","39"
"2003-10-22","68"
"2003-10-23","48"
"2003-10-24","98"
"2003-10-25","40"
"2003-10-26","56"
"2003-10-27","79"
"2003-10-28","130"
N otic e the expec ted s pike in new finds on releas e day, O c tober
2 4 th.
2.7.4. Working with These Results
I f you have a fairly s hort lis t, it's eas y to jus t look at the res ults
and s ee if there are any s pikes or partic ular items of interes t
about the res ult c ounts . But if you have a long lis t or you want a
vis ual overview of the res ults , it's eas y to us e thes e numbers to
c reate a graph in E xc el or your favorite s preads heet program.
Simply s ave the res ults to a file, and then open the file in E xc el
and us e the c hart wizard to c reate a graph. You'll have to do
s ome tweaking but jus t generating the c hart provides an
interes ting overview, as s hown in Figure 2 - 2 .
Figure 2-2. An Excel graph tracking mentions of
Mac OS X Panther
2.7.5. Hacking the Hack
You c an render the res ults as a web page by altering the c ode
ever s o s lightly (c hanges are in bold) and direc ting the output to
an H T M L file (>> filename.html):
...
print
header( ),
start_html("GooCount: $query"),
start_table({-border=>undef}, caption("GooCount:$query")),
Tr([ th(['Date', 'Count']) ]);
foreach my $julian ($start_julian..$end_julian) {
$full_query = "$query daterange:$julian-$julian";
my $result = $google_search ->
doGoogleSearch(
$google_key, $full_query, 0, 10, "false", "", "false",
"", "latin1", "latin1"
);
print
Tr([ td([
sprintf("%04d-%02d-%02d", inverse_julian_day($julian)),
$result->{estimatedTotalResultsCount}
]) ]);
print
end_table( ),
end_html;
Hack 26. Feel Really Lucky
Take the domain in which the f irst result of a query appears and
do more searching within that domain.
D oes G oogle make you feel luc ky? H ow luc ky? Sometimes as
luc ky as the top res ult is , more res ults from the s ame domain
are jus t as muc h s o.
T his hac k performs two G oogle queries . I t firs t s aves the domain
of the top res ult of the firs t s earc h is s aved. T hen it runs the
s ec ond query, s earc hing only the s aved domain for res ults .
Take, for example, G rac e H opper, famous both as a c omputer
programmer and as the pers on who c oined the term computer
bug. I f you were to run a s earc h with "G rac e H opper" as the
primary s earc h and overlay a s earc h for C O BO L on the domain
of the firs t res ult returned, you'd find the following three links at
the top of the res ults page:
GHC - 2004
http://www.gracehopper.org/ghc_press_factsheet1.html
... Website: www.gracehopper.org ... and on making machines unders
instructions led ultimately to the development of the business lan
GHC - 2004
http://www.gracehopper.org/ghc_press_factsheet.html
... Website: www.gracehopper.org ... and on making machines unders
instructions led ultimately to the development of the business lan
GHC2002
http://www.gracehopper.org/gmh2002/resources.html
... uspers-h/g-hoppr.htm. Whitman College: http://people.whitman.e
cobol.html. Yale University (The Ada Project): http ...
You c ould als o do a primary s earc h for a pers on ("Stan Laurel")
and a s ec ondary s earc h for another pers on (" Oliver Hardy"). O r
s earc h for a pers on, followed by their c orporate affiliation.
D on't try doing a link: s earc h with this
hac k. T he link: s pec ial s yntax does n't
work with any other s pec ial s yntaxes , and
this hac k relies upon inurl:.
2.8.1. The Code
Save the c ode as goolucky.cgi, a C G I s c ript on your web s erver
["H ow to Run the H ac ks " in the P refac e] or that of your I nternet
s ervic e provider.
#!/usr/local/bin/perl
# goolucky.cgi
# Gleans the domain from the first (read: top) result returned, al
# you to overlay another query, and returns the results, and so on
# goolucky.cgi is called as a CGI with form input.
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
use CGI qw/:standard/;
# Create a new SOAP instance.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# If this is the second time around, glean the domain.
my $query_domain = param('domain') ? "inurl:" . param('domain') :
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query') . " $query_domain", 0, 10,
"false", "", "false", "", "latin1", "latin1"
);
# Set domain to the results of the previous query.
param('domain', $results->{'resultElements'}->[0]->{'URL'});
param('domain', param('domain') =~ m#://(.*?)/#);
print
header( ),
start_html("I'm Feeling VERY Lucky"),
h1("I'm Feeling VERY Lucky"),
start_form( ),
'Query: ', textfield(-name=>'query',
-default=>'"Grace Hopper"'),
' ',
'Domain: ', textfield(-name=>'domain'),
' ',
submit(-name=>'submit', -value=>'Search'),
p( ),
'Results:';
foreach (@{$results->{'resultElements'}}) {
print p(
b($_->{title}), br( ),
a({href=>$_->{URL}}, $_->{URL}), br( ),
i($_->{snippet})
);
print
end_form( ),
end_html( );
Replac e insert key here with your G oogle A P I key.
2.8.2. Running the Hack
P oint your web brows er at the C G I s c ript, goolucky.cgi. T he
s c ript pops up a form in whic h you s hould enter a query and a
domain within whic h to s earc h; hit the Searc h button when you're
ready to run your query.
2.8.3. Hacking the Hack
You c an als o run this hac k s o that it only us es one query. For
example, do a s earc h with Q uery A , and the s earc h grabs the
domain from the firs t res ult. T hen run another s earc h, again
us ing Q uery A , but res tric t your res ults to the domain that was
grabbed in the firs t s earc h. T his is handy when you're trying to
get information on one s et of keywords , ins tead of trying to link
two different c onc epts . Figure 2 - 3 illus trates the I 'm Feeling
V E RY L uc ky s earc h.
Figure 2-3. I'm Feeling VERY Lucky search
Hack 27. Get Random Results (on
Purpose)
Surf ing random pages can turn up some brilliant f inds.
Why would any res earc her worth her s alt be interes ted in random
pages ? While s urfing random pages is n't what one might c all a
foc us ed s earc h, you'd be s urpris ed at s ome of the brilliant finds
that you'd never have c ome ac ros s otherwis e. I 've loved random
page generators as s oc iated with s earc h engines ever s inc e
dis c overing Random Yahoo! L ink
(http://random.yahoo.c om/bin/ryl, although no longer working at
the time of this writing). I t made me think that c reating s uc h a
thing to work with the G oogle A P I might prove interes ting, us eful
even.
2.9.1. The Code
T his c ode s earc hes for a random number between 0 and 99999
(yes , you c an s earc h for 0 with G oogle) in addition to a modifier
pulled from the @modifiers array. To generate the random page,
you don't, s tric tly s peaking, need s omething from the modifer
array. H owever, it helps make the page s elec tion even more
random.
With the c ombination of a number between 0 and 99999 and a
modifier from the @modifiers array, G oogle will get a lis t of s earc h
res ults , and from that lis t you'll get a "random" page. You c ould
go higher with the numbers if you wanted, but I was n't s ure that
this hac k would c ons is tently find numbers higher than 99999. (Zip
C odes are five digits , s o I knew a five- digit s earc h would find
res ults more often than not.)
Save the c ode as a C G I s c ript named goorandom.cgi. T he only
c hange you need to make is to replace insert key here with your
G oogle A P I key.
#!/usr/local/bin/perl
# goorandom.cgi
# Creates a random Google query and redirects the browser to
# the top/first result.
# goorandom.cgi is called as a CGI without any form input
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
# A list of search modifiers to be randomly chosen amongst for
# inclusion in the query.
my @modifiers = ( "-site:com", "-site:edu", "-site:net",
"-site:org", "-site:uk", "-filetype:pdf", );
# Picking a random number and modifier combination.
my $random_number = int( rand(99999) );
my $random_modifier = $modifiers[int( rand( scalar(@modifiers) ) )
# Create a new SOAP object.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Query Google.
my $results = $google_search ->
doGoogleSearch(
$google_key, "$random_number $random_modifier",
0, 1, "false", "", "false", "", "latin1", "latin1"
);
# redirect the browser to the URL of the top/first result
print "Location: $results->{resultElements}->[0]->{URL}\n\n";
2.9.2. Running the Hack
T his hac k runs as a C G I s c ript ["H ow to Run the H ac ks " in the
P refac e]. P oint your web brows er at goorandom.cgi.
2.9.3. Hacking the Hack
T here are a c ouple of ways to hac k this hac k.
2.9.3.1 Modifying the modifiers
You'll notic e eac h modifier in the @modifier array is prec eded by
a negative (whic h means "exc lude this "). You c an, of c ours e, add
anything you wis h, but it's highly rec ommended that you keep to
the negative theme; inc luding s omething like " computers" in the
lis t gives you a c hanc ea s light c hanc e, but a c hanc e
nevertheles s o f c oming up with no s earc h res ults at all. T he hac k
randomly exc ludes domains ; here are a few more pos s ibilities :
-intitle:queryword
-inurl:www
-inurl:queryword
-internet
-yahoo
-intitle:the
I f you want to, you c ould c reate modifiers that us e OR (|) ins tead
of negatives , and then s lant them to a partic ular topic . For
example, you c ould c reate an array with a medic al s lant that
looks like this :
(medicine | treatment | therapy)
(cancer | chemotherapy | drug)
(symptoms | "side effects")
(medical | research | hospital)
(inurl:edu | inurl:gov )
U s ing the OR modifier does not guarantee finding a s earc h res ult
like us ing a negative does , s o don't narrow your pos s ible res ults
by res tric ting your s earc h to the page's title or U RL .
2.9.3.2 Adding a touch more randomness
T he hac k, as it s tands , always pic ks the firs t res ult. While it's
already highly unlikely that you'll ever s ee the s ame random
page twic e, you c an ac hieve a touc h more randomnes s by
c hoos ing a random returned res ult. Take a gander at the ac tual
s earc h its elf in the hac k's c ode:
my $results = $google_search ->
doGoogleSearch(
$google_key, "$random_number $random_modifier",
0, 1, "false", "", "false", "", "latin1", "latin1"
);
You s ee that 0 at the beginning of the fourth line? T hat's the
offs et: the number of the firs t res ult to return. C hange that
number to anything between 0 and 999, and you'll s hift the res ults
returned by that numbera s s uming, of c ours e, that the number
you c hoos e is s maller than the number of res ults for the query at
hand. For the s ake of jus t about guaranteeing a res ult, it's
probably bes t to s tic k to numbers between 0 and 10. H ow about
randomizing the offs et? Simply alter the c ode as follows
(c hanges in bold):
...
# picking a random number, modifier, and offset combination
my $random_number = int( rand(99999) );
my $random_modifier = $modifiers[int( rand( scalar(@modifiers) ) )
my $random_offset = int( rand(10) );
...
my $results = $google_search ->
doGoogleSearch(
$google_key, "$random_number $random_modifier",
$random_offset, 1, "false", "", "false", "", "latin1", "latin
);
...
Hack 28. Permute a Query
Run all permutations of query keywords and phrases to squeeze
the last drop of results f rom the Google index.
G oogle, ah, G oogle. Searc h engine of over eight billion pages and
zillions of pos s ible res ults . I f you're a s earc h engine geek like I
am, few things are more entertaining than trying various tweaks
with your G oogle s earc h to s ee what exac tly makes a differenc e
to the res ults .
I t's amazing what makes a differenc e. For example, you wouldn't
think that word order would make muc h of an impac t, but it does .
I n fac t, buried in G oogle's doc umentation is the admis s ion that
the word order of a query will impac t s earc h res ults .
While that's an interes ting thought, who has time to generate
and run every pos s ible iteration of a multiword query? G oogle
A P I to the res c ue! T his hac k takes a query of up to four
keywords or "quoted phras es " (as well as s upporting s pec ial
s yntaxes ) and runs all pos s ible permutations , s howing res ult
c ounts by permutation and the top res ults for eac h permutation.
2.10.1. The Code
Save the following c ode as a C G I s c ript ["H ow to Run the
H ac ks " in the P refac e] named order_matters .cgi in your web
s ite's cgi-bin direc tory. A s you type in the s c ript, be s ure to
replac e insert key here with your G oogle A P I key.
You'll need to have the Algorithm::Permute P erl
module for this program to work c orrec tly
(http://s earc h.c pan.org/s earc h?
query=algorithm% 3 A % 3 A permute&mode=all).
#!/usr/local/bin/perl
# order_matters.cgi
# Queries Google for every possible permutation of up to 4 query k
# returning result counts by permutation and top results across pe
# order_matters.cgi is called as a CGI with form input
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
use CGI qw/:standard *table/;
use Algorithm::Permute;
print
header( ),
start_html("Order Matters"),
h1("Order Matters"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
' ',
submit(-name=>'submit', -value=>'Search'), br( ),
'<font size="-2" color="green">Enter up to 4 query keywords or "
end_form( ), p( );
if (param('query')) {
# Glean keywords.
my @keywords = grep !/^\s*$/, split /([+-]?".+?")|\s+/, param('q
scalar @keywords > 4 and
print('<font color="red">Only 4 query keywords or phrases allowe
my $google_search = SOAP::Lite->service("file:$google_wdsl");
print
start_table({-cellpadding=>'10', -border=>'1'}),
Tr([th({-colspan=>'2'}, ['Result Counts by Permutation' ])]),
Tr([th({-align=>'left'}, ['Query', 'Count'])]);
my $results = {}; # keep track of what we've seen across queries
# Iterate over every possible permutation.
my $p = new Algorithm::Permute( \@keywords );
while (my $query = join(' ', $p->next)) {
# Query Google.
my $r = $google_search ->
doGoogleSearch(
$google_key,
$query,
0, 10, "false", "", "false", "", "latin1", "latin1"
);
print Tr([td({-align=>'left'}, [$query, $r->{'estimatedTotalR
@{$r->{'resultElements'}} or next;
# Assign a rank.
my $rank = 10;
foreach (@{$r->{'resultElements'}}) {
$results->{$_->__CON_L_BRACKETCON_R_BRACKET_ _} = {
title => $_->{title},
snippet => $_->{snippet},
seen => ($results->{$_->{URL}}->{seen}) + $rank
};
$rank--;
print
end_table( ), p( ),
start_table({-cellpadding=>'10', -border=>'1'}),
Tr([th({-colspan=>'2'}, ['Top Results across Permutations' ])]),
Tr([th({-align=>'left'}, ['Score', 'Result'])]);
foreach ( sort { $results->{$b}->{seen} <=> $results->{$a}->{seen}
print Tr(td([
$results->{$_}->{seen},
b($results->{$_}->{title}||'no title') . br( ) .
a({href=>$_}, $_) . br( ) .
i($results->{$_}->{snippet}||'no snippet')
]));
print end_table( ),
print end_html( );
2.10.2. Running the Hack
P oint your web brows er at the C G I s c ript order_matters .cgi on
your web s erver. E nter the query you want to c hec k (up to four
words or phras es ). T he s c ript will firs t s earc h for every pos s ible
c ombination of the s earc h words and phras es , as s hown in Figure
2 -4 .
Figure 2-4. Permutations for applescript google
api
T he s c ript will then dis play the top 1 0 s earc h res ults ac ros s all
permutations of the query, as s hown in Figure 2 - 5 .
Figure 2-5. Top results for permutations of
applescript google api
A t firs t blus h, this hac k looks like a novelty with few prac tic al
applic ations . But if you're a regular res earc her or a web wrangler,
you might find it of interes t.
I f you're a regular res earc hert hat is , there are c ertain topic s
that you res earc h on a regular bas is y ou might want to s pend
s ome time with this hac k and s ee if you c an detec t a pattern in
how your regular s earc h terms are impac ted by c hanging word
order. You might need to revis e your s earc hing s o that c ertain
words always c ome firs t or las t in your query.
I f you're a web wrangler, you need to know where your page
appears in G oogle's s earc h res ults . I f your page los es a lot of
ranking ground bec aus e of a s hift in a query arrangement, maybe
you want to add s ome more words to your text or s hift your
exis ting text.
Hack 29. Weight a Query Keyword
A dd more weight to a particular keyword in your Google search
query f or more targeted results.
A s we've mentioned before, G oogle will provide different res ults
bas ed on how many times a s earc h word is us ed in a query [Hack
#15] .
P art of this is as s umed to be bec aus e G oogle tries to s earc h
your query in order, giving higher relevanc e to words that appear
in the s ame order in a page as they do in your query. (I n other
words , when you s earc h for baseball baseball, G oogle tries to
find, and as s igns more weight to, pages that have the phras e
" baseball baseball" in them, even if you don't us e quotes in your
s earc h.)
G oogle als o s eems to look for multiple iterations of the s ame
word in your res ult pages . So when you s earc h for baseball
baseball baseball, G oogle looks for the word bas eball three times
in your res ult pages .
T his is a nifty and little- known s earc h tric k when you want to
emphas ize a partic ular word in your s earc h, but typing in the
s ame word s everal times is a drag. T his hac k automates the
proc es s for you, adding weight to the keyword of your c hoic e and
s ending you on to G oogle for res ults .
2.11.1. The Code
Save the following c ode as a C G I s c ript ["H ow to Run the
H ac ks " in the P refac e] named s inker.cgi in your web s ite's cgi-
bin direc tory.
T his s c ript does n't require the G oogle
A P I ! I ns tead, it direc tly opens a G oogle
s earc h res ult U RL .
#!/usr/local/bin/perl
# sinker.cgi
# Weight a specific keyword in a Google query.
use strict;
use CGI qw{:standard};
# Display the query form if both a query and sinker are not provid
unless (param('query') and param('sinker')) {
print
header( ),
start_html("Search Sinker"),
h1("Search Sinker"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'), p( ),
'Sinker (word to weight): ', textfield(-name=>'sinker'), p( ),
submit(-name=>'submit', -value=>'Search'), p( ),
"You will be taken straight to a Google results page. Be sure
query box at the top of the page to see how your query turned out.
end_form( ), p( );
else {
# Normalize the sinker and prefix with a space.
my($sinker) = param('sinker') =~ /^\s*(.+)\s*$/;
$sinker = " $sinker";
# Build the Google query URL.
my $query = param('query') . $sinker x ( 10 - (scalar split / /
# Webify the query.
$query =~ s/([^a-zA-Z0-9 ])/"%".sprintf("%2.2x", unpack("C", ($1
$query =~ tr/ /+/;
# Redirect the browser to Google, sinker search in tow
print redirect("http://www.google.com/search?num=100&q=$query");
2.11.2. Running the Hack
P oint your brows er at the s inker.cgi s c ript on your web s erver. I n
the form, enter any query in the Q uery field and the word you
want to emphas ize in the Sinker field and c lic k the Searc h
button. You'll be redirec ted to a G oogle res ults page c hoc k- full of
up to 1 0 0 res ults .
Be s ure to take a gander at how your query turned out in the
G oogle s earc h box at the top of the res ults page. I f, for ins tanc e,
you queried for " Moises Alou" with a s inker of baseball, the
res ulting query s ent on to G oogle would look like this :
"Moises Alou" baseball baseball baseball baseball baseball basebal
Hack 30. Restrict Searches to Top-Level
Results
Separate out search results by the depth at which they appear
in a site.
G oogle's a mighty big hays tac k in whic h to find the needle you
s eek. A nd there's more, s o muc h more: s ome experts believe
that G oogle and its ilk index only a bare frac tion of the pages
available on the Web.
Bec aus e the Web's growing all the time, res earc hers have to
c ome up with lots of different tric ks to narrow down s earc h
res ults . Tric ks andt hanks to the G oogle A P I t ools . T his hac k
s eparates out s earc h res ults appearing at the top level of a
domain from thos e beneath.
Why would you want to do this ?
C lear away c lutter when s earc hing for proper names . I f
you're s earc hing for general information about a proper
name, this is one way to c lear out mentions in news
s tories , etc . For example, the name of a politic al leader
s uc h as Tony Blair might be mentioned in a s tory without
any s ubs tantive information about the man hims elf. But if
you limited your res ults to only thos e pages on the top
level of a domain, you would avoid mos t of thos e mention
hits .
Find patterns in the as s oc iation of highly ranked
domains and c ertain keywords .
N arrow s earc h res ults to only thos e bits that s ites deem
important enough to have in their virtual foyers .
Skip pas t s ubs ites , s uc h as home pages c reated by J .
Random U s er on his s ervic e provider's web s erver.
2.12.1. The Code
Save the c ode as a C G I s c ript ["H ow to Run the H ac ks " in the
P refac e] named gootop.cgi:
#!/usr/local/bin/perl
# gootop.cgi
# Separates out top-level and sub-level results.
# gootop.cgi is called as a CGI with form input.
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
use strict;
use SOAP::Lite;
use CGI qw/:standard *table/;
print
header( ),
start_html("GooTop"),
h1("GooTop"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
' ',
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
my $google_search = SOAP::Lite->service("file:$google_wdsl");
if (param('query')) {
my $list = { 'toplevel' => [], 'sublevel' => [] };
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset,
10, "false", "", "false", "", "latin1", "latin1"
);
foreach (@{$results->{'resultElements'}}) {
push @{
$list->{ $_->{URL} =~ m!://[^/]+/?$!
? 'toplevel' : 'sublevel' }
},
p(
b($_->{title}||'no title'), br( ),
a({href=>$_->{URL}}, $_->{URL}), br( ),
i($_->{snippet}||'no snippet')
);
print
h2('Top-Level Results'),
join("\n", @{$list->{toplevel}}),
h2('Sub-Level Results'),
join("\n", @{$list->{sublevel}});
}
print end_html;
G leaning a dec ent number of top- level domain res ults means
throwing out quite a bit. I t's for this reas on that this s c ript runs
the s pec ified query a number of times , as s pec ified by my $loops
= 10;, eac h loop pic king up 1 0 res ults , s ome s ubs et being top-
level. To alter the number of loops per query, s imply c hange the
value of $loops. Realize that eac h invoc ation of the s c ript burns
through $loops number of queries , s o be s paring and don't bump
that number up to anything ridic ulous ; even 1 0 0 will eat through
a daily allotment in jus t 1 0 invoc ations .
T he heart of the s c ript, and what differentiates it from your
average G oogle A P I P erl s c ript [Hack #92], lies in the c ode that
follows .
push @{
$list->{ $_->{URL} =~ m!://[^/]+/?$!
? 'toplevel' : 'sublevel' }
What that jumble of c harac ters is s c anning for is :// (as in
http://) followed by anything other than a / (s las h), thereby
s ifting between top- level finds (e.g.,
http://www.berkeley.edu/welc ome.html) and s ublevel res ults
(e.g., http://www.berkeley.edu/s tudents /john_doe/my_dog.html).
I f you're P erl s avvy, you may have notic ed the trailing /?$; this
allows for the eventuality that a top- level U RL ends with a s las h
(e.g., http://www.berkeley.edu/), as is often true.
2.12.2. Running the Hack
T his hac k runs as a C G I s c ript. Figure 2 - 6 s hows the res ults of
a s earc h for non-gmo (G enetic ally M odified O rganis ms , that is ).
Figure 2-6. GooTop search for non-gmo
2.12.3. Hacking the Hack
T here are a c ouple of ways to hac k this hac k.
2.12.3.1 More depth
P erhaps your interes ts lie in jus t how deep res ults are within a
s ite or s ites . A minor adjus tment or two to the c ode and you
have res ults grouped by depth:
#!/usr/bin/perl
# gootop.cgi
# Separates out top level and sub-level results
# gootop.cgi is called as a CGI with form input.
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 1;
use strict;
use lib qw!/home/rael/lib/perl!; #FIXME
use SOAP::Lite;
use CGI qw/:standard *table/;
print
header( ),
start_html("GooTop"),
h1("GooTop"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
' ',
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
my $google_search = SOAP::Lite->service("file:$google_wdsl");
if (param('query')) {
my @list = ( );
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset,
10, "false", "", "false", "", "latin1", "latin1"
);
foreach (@{$results->{'resultElements'}}) {
push @{ $list[scalar ( split(/\//, $_->{URL} . ' ') - 3 ) ]
p(
b($_->{title}||'no title'), br( ),
a({href=>$_->{URL}}, $_->{URL}), br( ),
i($_->{snippet}||'no snippet')
);
for my $level (1..$#list) {
print h2("Level: $level");
ref $list[$level] eq 'ARRAY' and print join "\n", @{$list[$lev
print end_html;
Figure 2 - 7 s hows that non-gmo s earc h again us ing the depth
hac k.
Figure 2-7. GooTop non-gmo search using depth
hack
2.12.3.2 Query tips
A long with the aforementioned c ode hac king, here are a few
query tips to us e with this hac k:
C ons ider feeding the s c ript a daterange: [Hack #16]
query to further narrow res ults .
Keep your s earc hes s pec ific , but not too muc h s o for
fear of turning up no top- level res ults . I ns tead of cats,
for example, us e " burmese cats", but don't try " burmese
breeders" feeding.
Try the link: s yntax ["Spec ial Syntax" in C hapter 1 ].
T his is a nic e us e of a s yntax otherwis e not allowed in
c ombination ["M ixing Syntaxes " in C hapter 1 ] with any
others .
O n oc c as ion, intitle: works nic ely with this hac k. Try
your query without s pec ial s yntaxes firs t, though, and
work your way up, making s ure you're getting res ults
after eac h c hange.
Hack 31. Search for Special Characters
Search f or the tilde and other special characters in URLs.
G oogle c an find lots of different things , but at the time of this
writing, it c an't find s pec ial c harac ters e xc ept for $, rec ently
added for us e in number range s earc hes ["G oogle Web Searc h
Bas ic s " and "N umber Range" in C hapter 1 ]. T hat's a s hame,
bec aus e s pec ial c harac ters c an c ome in handy. T he tilde (~), for
example, denotes pers onal web pages .
T his hac k takes a query from a form, pulls res ults from G oogle,
and filters the res ults for the pres enc e of s everal different
s pec ial c harac ters in the U RL , inc luding the tilde.
Why would you want to do this ? By altering this hac k s lightly
(s ee the "H ac king the H ac k" s ec tion), you c ould res tric t your
s earc hes to jus t pages with a tilde in the U RL , an eas y way to
find pers onal pages . M aybe you're looking for dynamic ally
generated pages with a ques tion mark (?) in the U RL ; you c an't
find thes e us ing G oogle by its elf, but you c an with this hac k.
A nd, of c ours e, you c an turn the hac k ins ide- out and not return
res ults c ontaining ~, ?, or other s pec ial c harac ters . I n fac t, this
c ode is more of a beginning than an end unto its elf: you c an
tweak it in s everal different ways to do s everal different things .
2.13.1. The Code
Save this c ode to a text file c alled aunt_tilde.cgi. Replac e insert
key here with your G oogle A P I key.
#!/usr/local/bin/perl
# aunt_tilde.pl
# Finding special characters in Google result URLs.
# Your Google API developer's key.
my $google_key='insert key here';
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use CGI qw/:standard/;
use SOAP::Lite;
print
header( ),
start_html("Aunt Tilde"),
h1("Aunt Tilde"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
br( ),
'Characters to find: ',
checkbox_group(
-name=>'characters',
-values=>[qw/ ~ @ ? ! /],
-defaults=>[qw/ ~ /]
),
br( ),
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
if (param('query')) {
# Create a regular expression to match preferred special charact
my $special_regex = '[\\' . join('\\', param('characters')) . ']
my $google_search = SOAP::Lite->service("file:$google_wdsl");
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset, 10, "false", "", "f
"", "latin1", "latin1"
);
last unless @{$results->{resultElements}};
foreach my $result (@{$results->{'resultElements'}}) {
# Output only matched URLs, highlighting special characters
my $url = $result->{URL};
$url =~ s!($special_regex)!<font color="red">$1</font>!g an
print
p(
b(a({href=>$result->{URL}},$result->{title}||'no title
$url, br( ),
i($result->{snippet}||'no snippet')
);
print end_html;
}
2.13.2. Running the Hack
P oint your brows er at the aunt_tilde.cgi C G I s c ript, type a s earc h
query into the Q uery field, c lic k the c hec kboxes next to the
s pec ial c harac ters you're after, and c lic k the Searc h button.
2.13.3. Hacking the Hack
T here are a c ouple of interes ting ways to c hange this hac k.
2.13.3.1 Choosing special characters
You c an eas ily alter the lis t of s pec ial c harac ters that you're
interes ted in by c hanging one line in the s c ript:
-values=>[qw/ ~ @ ? ! /],
Simply add or remove s pec ial c harac ters from the s pac e-
delimited lis t between the / (forward s las h) c harac ters . I f, for
example, you want to add & (ampers ands ) and z (why not? ), while
dropping ? (ques tion marks ), that line of c ode s hould be:
-values=>[qw/ ~ @ !
& z
/],
D on't forget thos e s pac es between
c harac ters in the lis t.
2.13.3.2 Excluding special characters
You c an jus t as eas ily dec ide to exc lude U RL s that c ontain your
s pec ial c harac ters as inc lude them. Simply c hange the =~ (read:
does matc h) in this line:
$url =~ s!($special_regex)!<font color="red">$1</font>!g an
to !~ (read: does not matc h), leaving:
$url
~ s!($special_regex)!<font color="red">$1</font>!g and
N ow, any res ult c ontaining the s pec ific c harac ters will not s how
up.
Hack 32. Dig Deeper into Sites
Dig deeper into the hierarchies of web sites matching your
search criteria.
O ne of G oogle's big s trengths is that it c an find your s earc h
term ins tantly and with great prec is ion. But s ometimes you're
not interes ted s o muc h in one definitive res ult as in lots of
divers e res ults ; maybe you even want s ome that are a bit more
on the obs c ure s ide.
O ne method I 've found rather us eful is to ignore all res ults
s hallower than a partic ular level in a s ite's direc tory hierarc hy.
You avoid all the c lutter of finds on home pages and go for
s ubjec t matter otherwis e hidden away in the depths of a s ite's
s truc ture. While c ontent c omes and goes , ebbs and flows from a
s ite's main foc us , it tends to gather in more permanent loc ales ,
c ategorized and arc hived, like with like.
T his s c ript as ks for a query along with a preferred depth, above
whic h res ults are thrown out. Spec ify a depth of four and your
res ults will c ome only from http://example.c om/a/b/c /d, not /a,
/a/b/, or /a/b/c .
Bec aus e you're already limiting the kinds of res ults that you
s ee, it's bes t to us e more c ommon words for what you're looking
for. O bs c ure query terms c an often return abs olutely no res ults .
T he default number of loops , retrieving 1 0
items apiec e, is s et to 5 0 . T his is to
as s ure that you glean s ome dec ent
number of res ults bec aus e many will be
tos s ed. You c an, of c ours e, alter this
number, but bear in mind that you're us ing
that number of your daily quota of 1 ,0 0 0
G oogle A P I queries per developer's key.
2.14.1. The Code
Save this c ode as deep_blue_g.cgi, a C G I s c ript ["H ow to Run
the H ac ks " in the P refac e] on your web s erver. A s you type it in,
replac e insert key here with your G oogle A P I key.
#!/usr/local/bin/perl
# deep_blue_g.cgi
# Limiting search results to a particular depth in a web
# site's hierarchy.
# deep_blue_g.cgi is called as a CGI with form input.
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
use SOAP::Lite;
use CGI qw/:standard *table/;
print
header( ),
start_html("Fishing in the Deep Blue G"),
h1("Fishing in the Deep Blue G"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
br( ),
'Depth: ', textfield(-name=>'depth', -default=>4),
br( ),
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
# Make sure a query and numeric depth are provided.
if (param('query') and param('depth') =~ /\d+/) {
# Create a new SOAP object.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset, 10, "false", "", "f
"", "latin1", "latin1"
);
last unless @{$results->{resultElements}};
foreach my $result (@{$results->{'resultElements'}}) {
# Determine depth.
my $url = $result->{URL};
$url =~ s!^\w+://|/$!!g;
# Output only those deep enough.
( split(/\//, $url) - 1) >= param('depth') and
print
p(
b(a({href=>$result->{URL}},$result->{title}||'no title
$result->{URL}, br( ),
i($result->{snippet}||'no snippet')
);
}
print end_html;
2.14.2. Running the Hack
T his hac k runs as a C G I s c ript. P oint your brows er at
deep_blue_g.cgi, fill out the query and depth fields , and c lic k the
Submit button.
Figure 2 - 8 s hows a query for " Jacques Cousteau", res tric ting
res ults to a depth of s ix; that's s ix levels down from the s ite's
home page. You'll notic e s ome pretty long U RL s in there.
Figure 2-8. A search for "Jacques Cousteau",
restricting results to six levels down
2.14.3. Hacking the Hack
P erhaps you're interes ted in jus t the oppos ite of what this hac k
provides : you want only res ults from higher up in a s ite's
hierarc hy. H ac king this hac k is s imple enough: s wap in a < (les s
than) s ymbol ins tead of the > (greater than) in the following line:
( split(/\//, $url) - 1) <= param('depth') and
Hack 33. Summarize Results by Domain
Get an overview of the sorts of domains (educational,
commercial, f oreign, and so f orth) f ound in the results of a
Google query.
You want to know about a topic , s o you do a s earc h. But what do
you have? A lis t of pages . You c an't get a good idea of the types
of pages thes e are without taking a c los e look at the lis t of s ites .
T his hac k is an attempt to get a s naps hot of the types of s ites
that res ult from a query. I t does this by taking a s uffix cens us , a
c ount of the different domains that appear in s earc h res ults .
T his is mos t ideal for running link: queries , providing a good
idea of what kinds of domains (c ommerc ial, educ ational, military,
foreign, etc .) are linking to a partic ular page.
You c ould als o run it to s ee where tec hnic al terms , s lang terms ,
and unus ual words are turning up. Whic h pages mention a
partic ular s inger more often? O r a politic al figure? D oes the word
"democ rat" c ome up more often on .com or .edu s ites ?
O f c ours e, this s naps hot does n't provide a c omplete inventory,
but as overviews go, it's rather interes ting.
2.15.1. The Code
Save the c ode as s uffixcens us .cgi, a C G I s c ript ["H ow to Run the
H ac ks " in the P refac e] on your web s erver:
#!/usr/local/bin/perl
# suffixcensus.cgi
# Generates a snapshot of the kinds of sites responding to a
# query. The suffix is the .com, .net, or .uk part.
# suffixcensus.cgi is called as a CGI with form input.
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
use SOAP::Lite;
use CGI qw/:standard *table/;
print
header( ),
start_html("SuffixCensus"),
h1("SuffixCensus"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
' ',
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
if (param('query')) {
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my %suffixes;
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset, 10, "false", "", "f
"", "latin1", "latin1"
);
last unless @{$results->{resultElements}};
map { $suffixes{ ($_->{URL} =~ m#://.+?\.(\w{2,4})/#)[0] }++ }
@{$results->{resultElements}};
print
h2('Results: '), p( ),
start_table({cellpadding => 5, cellspacing => 0, border => 1})
map( { Tr(td(uc $_),td($suffixes{$_})) } sort keys %suffixes )
end_table( );
}
print end_html( );
Be s ure to replac e insert key here with your G oogle A P I key.
2.15.2. Running the Hack
T his hac k runs as a C G I s c ript. P oint your brows er at
s uffixcens us .cgi to run it.
2.15.3. The Results
Searc hing for the prevalenc e of " soda pop" by s uffix finds , as one
might expec t, the mos t mention on .coms , as s hown in Figure 2 -
9.
Figure 2-9. Prevalence of "soda pop" by suffix
2.15.4. Hacking the Hack
T here are a c ouple of ways to hac k this hac k.
2.15.4.1 Going back for more
T his s c ript, by default, vis its G oogle 1 0 times , grabbing the top
1 0 0 (or fewer, if there aren't as many) res ults . To inc reas e or
dec reas e the number of vis its , s imply c hange the value of the
$loops variable at the top of the s c ript. Bear in mind, however,
that making $loops = 50 might net you 5 0 0 res ults , but you're
als o eating quic kly into your daily allotment of 1 ,0 0 0 G oogle
A P I queries .
2.15.4.2 Returning comma-separated
output
I t's rather s imple to adjus t this s c ript to run from the c ommand
line and return a c omma- s eparated output s uitable for E xc el or
your average databas e. Remove the s tarting H T M L , form, and
ending H T M L output, and alter the c ode that prints out the
res ults . I n the end, you c ome to s omething like this (c hanges in
bold):
#!/usr/local/bin/perl
# suffixcensus_csv.pl
# Generates a snapshot of the kinds of sites responding to a
# query. The suffix is the .com, .net, or .uk part.
# Usage: perl suffixcensus_csv.pl query="your query" > results.csv
# Your Google API developer's key.
my $google_key='insert key';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 1;
use SOAP::Lite;
use CGI qw/:standard/;
param('query')
or die qq{usage: suffixcensus_csv.pl query="{query}" [> results.
print qq{"suffix","count"\n};
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my %suffixes;
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset, 10, "false", "", "fal
"", "latin1", "latin1"
);
last unless @{$results->{resultElements}};
map { $suffixes{ ($_->{URL} =~ m#://.+?\.(\w{2,4})/#)[0] }++ }
@{$results->{resultElements}};
}
print map { qq{"$_", "$suffixes{$_}"\n} } sort keys %suffixes;
I nvoke the s c ript from the c ommand line like s o:
$ perl suffixcensus_csv.pl query="query" > results.csv
Searc hing for mentions of "c olddrink," the South A fric an vers ion
of "s oda pop," s ending the output s traight to the s c reen rather
than a res ults .cs v file, looks like this :
$ perl suffixcensus_csv.pl query="colddrink"
"suffix","count"
"com", "12"
"info", "1"
"net", "1"
"za", "6"
Hack 34. Measure Google Mindshare
Measure the Google mindshare of a particular person within a
query domain.
Bas ed on an idea by author Steven J ohns on
(http://www.s tevenberlinjohns on.c om), this hac k determines the
G oogle minds hare of a pers on within a partic ular s et of G oogle
queried keywords . What's Willy Wonka's G oogle minds hare of
"Willy"? What perc entage of "weatherman" does A l Roker hold?
Who has the greater "T he Beatles " G oogle minds hare, Ringo
Starr or P aul M c C artney? M ore importantly, what G oogle
minds hare of your indus try does your c ompany own?
G oogle minds hare is c alc ulated as follows . D etermine the s ize of
the res ult s et for a keyword or phras e. D etermine the res ult s et
s ize for that query along with a partic ular pers on. D ivide the
s ec ond by the firs t and multiply by 1 0 0 , yielding perc ent G oogle
minds hare. For example: A query for Willy yields about
2 ,7 6 0 ,0 0 0 res ults . " Willy Wonka" +Willy finds 1 3 3 ,0 0 0 . We c an
c onc ludeh owever uns c ientific allyt hat Willy Wonka holds roughly
a 5 % ((1 3 3 ,0 0 0 / 2 ,7 6 0 ,0 0 0 ) x 1 0 0 = ~ 4 .8 2 ) G oogle
minds hare of "Willy."
Sure, it's a little s illy, but there's probably a grain of truth in it
s omewhere.
2.16.1. The Code
Save the following c ode as a C G I s c ript ["H ow to Run the
H ac ks " in the P refac e] c alled google_minds hare.cgi in your web
s ite's cgi-bin direc tory.
#!/usr/local/bin/perl
# google_mindshare.cgi
# This implementation by Rael Dornfest,
# http://www.raelity.org/lang/perl/google/googleshare/
# Based on an idea by Steven Johnson,
# http://www.stevenberlinjohnson.com/movabletype/archives/000009.h
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use SOAP::Lite;
use CGI qw/:standard *table/;
print
header( ),
start_html("Googleshare Calculator"),
h1("Googleshare Calculator"),
start_form(-method=>'GET'),
'Query: ', br( ), textfield(-name=>'query'),
p( ),
'Person: ',br( ), textfield(-name=>'person'),
p( ),
submit(-name=>'submit', -value=>'Calculate'),
end_form( ), p( );
if (param('query') and param('person')) {
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Query Google for they keyword, keywords, or phrase.
my $results = $google_search ->
doGoogleSearch(
$google_key, '"'.param('query').'"', 0, 1, "false", "", "fa
"", "latin1", "latin1"
);
# Save the results for the Query.
my $query_count = $results->{estimatedTotalResultsCount};
my $results = $google_search ->
doGoogleSearch(
$google_key, '+"'.param('query').'" +"'.param('person').'"',
"false", "", "false", "", "latin1", "latin1"
);
# Save the results for the Query AND Person.
my $query_person_count = $results->{estimatedTotalResultsCount};
print
p(
b(sprintf "%s has a %.2f%% googleshare of %s",
param('person'),
($query_person_count / $query_count * 100),
'"'.param('query').'"'
print end_html( );
2.16.2. Running the Hack
V is it the C G I s c ript in your brows er. E nter a query and a pers on.
T he name does n't nec es s arily have to be a pers on's full name. I t
c an be a c ompany, loc ation, jus t about any proper noun, or
anything, ac tually. C lic k the C alc ulate button and enjoy. Figure
2 - 1 0 s hows the Willy Wonka example.
Figure 2-10. Google mindshare for Willy Wonka
2.16.3. Fun Hack Uses
You c an't do too many prac tic al things with this hac k, but you
c an have a lot of fun with it. P laying unlikely percentages is fun;
s ee if you c an find a name/word c ombo that gets a higher
perc entage than other perc entages that you would c ons ider
more likely. H ere are the ans wers to the ques tions pos ted at the
beginning of this hac k, and more:
Willy Wonka has a 4 .8 2 % G oogle minds hare of "Willy."
A l Roker has a 2 .4 7 % G oogle minds hare of
"weatherman."
Ringo Starr has a 1 .5 5 % G oogle minds hare of "T he
Beatles ."
P aul M c C artney has a 6 .9 5 % G oogle minds hare of "T he
Beatles ."
Red H at has a 5 .0 8 % G oogle minds hare of "L inux."
M ic ros oft has a 6 .8 7 % G oogle minds hare of "L inux."
Hack 35. SafeSearch Certify URLs
Feed URLs into Google's Saf eSearch to determine whether they
point at questionable content.
O nly three things in life are c ertain: death, taxes , and
ac c identally vis iting a onc e family- s afe web s ite that now
c ontains text and images that would make a hors e blus h.
A s you probably know if you've ever put up a web s ite, domain
names are regis tered for finite lengths of time. Sometimes
regis trations ac c identally expire; s ometimes bus ines s es fold
and allow the regis trations to expire; s ometimes other
c ompanies take them over.
O ther c ompanies might jus t want the domain name, s ome
c ompanies want the traffic that the defunc t s ite generated, and
in a few c as es , the new owners of the domain name try to hold it
hos tage, offering to s ell it bac k to the original owners for a great
deal of money. (T his does n't work as well as it us ed to bec aus e
of the dearth of I nternet c ompanies that ac tually have a great
deal of money.)
When a s ite is n't what it onc e was , that's no big deal. When it's
not what it onc e was and is now X- rated, that's a bigger deal.
When it's not what it onc e was , is now X- rated, and is on the link
lis t of a s ite you run, that's a really big deal.
But how to keep up with all the links ? You c an vis it eac h link
periodic ally to determine if it's s till okay, you c an wait for
hys teric al emails from s ite vis itors , or you c an jus t not worry
about it. O r you c an put the G oogle A P I to work.
T his program lets you c hec k a lis t of U RL s in G oogle's
SafeSearc h mode. I f they appear in the SafeSearc h mode,
they're probably okay. I f they don't appear, they're either not in
G oogle's index or not "s afe" enough to pas s through G oogle's
filter. T he program then c hec ks the U RL s mis s ing from a
SafeSearc h with a nonfiltered s earc h. I f they do not appear in a
nonfiltered s earc h, they're labeled as unindexed. I f they do
appear in a nonfiltered s earc h, they're labeled as "s us pec t."
2.17.1. Danger, Will Robinson!
While G oogle's SafeSearc h filter is good, it's not infallible. (I
have yet to s ee an automated filtering s ys tem that is infallible.)
So if you run a lis t of U RL s through this hac k and they all s how
up in a SafeSearc h query, don't take that as a guarantee that
they're all c ompletely inoffens ive. Take it merely as a pretty
good indic ation that they are. I f you want abs olute as s uranc e,
you're going to have to vis it every link pers onally and frequently.
H ere's a fun idea if you need an I nternet-
related res earc h projec t. Take 5 0 0 or s o
domain names at random and run this
program on the lis t onc e a week for s everal
months , s aving the res ults to a file eac h
time. I t'd be interes ting to s ee how many
domains /U RL s end up being filtered out of
SafeSearc h over time.
2.17.2. The Code
Save the following P erl s ourc e c ode as a text file named
s us pect.pl:
#!/usr/local/bin/perl
# suspect.pl
# Feed URLs to a Google SafeSearch. If inurl: returns results, the
# URL probably isn't questionable content. If inurl: returns no
# results, either it points at questionable content or isn't in
# the Google index at all.
# Your Google API developer's key.
my $google_key = 'put your key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
$|++; # turn off buffering
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# CSV header
print qq{"url","safe/suspect/unindexed","title"\n};
while (my $url = <>) {
chomp $url;
$url =~ s!^\w+?://!!;
$url =~ s!^www\.!!;
# SafeSearch
my $results = $google_search ->
doGoogleSearch(
$google_key, "inurl:$url", 0, 10, "false", "", "true",
"", "latin1", "latin1"
);
print qq{"$url",};
if (grep /$url/, map { $_->{URL} } @{$results->{resultElements}}
print qq{"safe"\n};
else {
# unSafeSearch
my $results = $google_search ->
doGoogleSearch(
$google_key, "inurl:$url", 0, 10, "false", "", "false",
"", "latin1", "latin1"
);
# Unsafe or Unindexed?
print (
(scalar grep /$url/, map { $_->{URL} } @{$results->{resultEl
? qq{"suspect"\n}
: qq{"unindexed"\n}
);
2.17.3. Running the Hack
To run the hac k, you'll need a text file that c ontains the U RL s
that you want to c hec k, one line per U RL . For example:
http://www.oreilly.com/catalog/essblogging/
http://www.xxxxxxxxxx.com/preview/home.htm
hipporhinostricow.com
T he program runs from the c ommand line ["H ow to Run the
H ac ks " in the P refac e]. E nter the name of the s c ript, a les s - than
s ign, and the name of the text file that c ontains the U RL s that
you want to c hec k. T he program will return res ults that look like
this :
% perl suspect.pl < urls.txt
"url","safe/suspect/unindexed"
"oreilly.com/catalog/essblogging/","safe"
"xxxxxxxxxx.com/preview/home.htm","suspect"
"hipporhinostricow.com","unindexed"
T he firs t item is the U RL being c hec ked, and the s ec ond is it's
probable s afety rating as follows :
safe
T he U RL appeared in a G oogle SafeSearc h for the U RL .
suspect
T he U RL did not appear in a G oogle SafeSearc h but did
in an unfiltered s earc h.
unindexed
T he U RL appeared in neither a SafeSearc h nor unfiltered
s earc h.
You c an redirec t output from the s c ript to a file for import into a
s preads heet or databas e:
% perl suspect.pl < urls.txt > urls.csv
2.17.4. Hacking the Hack
You c an us e this hac k interac tively, feeding it U RL s one at a
time. I nvoke the s c ript with perl suspect.pl, but don't feed it a
text file of U RL s to c hec k. E nter a U RL and hit the return key on
your keyboard. T he s c ript will reply in the s ame manner that it
does when fed multiple U RL s . T his is handy when you jus t need
to s pot- c hec k a c ouple of U RL s on the c ommand line. When
you're ready to quit, break out of the s c ript us ing C trl- D under
U nix or C trl- Break on a Windows c ommand line.
H ere's a trans c ript of an interac tive s es s ion with s us pect.pl:
% perl suspect.pl
"url","safe/suspect/unindexed","title"
http://www.oreilly.com/catalog/essblogging/
"oreilly.com/catalog/essblogging/","safe"
http://www.xxxxxxxxxx.com/preview/home.htm
"xxxxxxxxxx.com/preview/home.htm","suspect"
hipporhinostricow.com
"hipporhinostricow.com","unindexed"
^d
%
Hack 36. Search Google Topics
Run queries against some of the available Google A PI specialty
topics.
G oogle does n't talk about it muc h, but it does make s pec ialty
web s earc hes available. A nd I 'm not jus t talking about s earc hes
limited to a c ertain domain. I 'm talking about s earc hes that are
devoted to a partic ular topic
(http://www.google.c om/options /s pec ials earc hes .html). T he
G oogle A P I makes four of thes e s earc hes available: the U .S.
G overnment, L inux, BSD , and M ac intos h.
I n this hac k, we'll look at a program that takes a query from a
form and provides a c ount of that query in eac h s pec ialty topic ,
as well as a c ount of res ults for eac h topic . T his program runs
via a form.
2.18.1. Why Topic Search?
Why would you want to topic s earc h? Bec aus e G oogle c urrently
indexes over eight billion pages . I f you try to do more than very
s pec ific s earc hes , you might find yours elf with far too many
res ults . I f you narrow down your s earc h by topic , you c an get
good res ults without having to exac tly zero in on your s earc h.
You c an als o us e it to do s ome dec idedly uns c ientific res earc h.
Whic h topic c ontains more iterations of the phras e "open
s ourc e"? Whic h c ontains the mos t pages from .edu (educ ational)
domains ? Whic h topic , M ac intos h or FreeBSD , has more on us er
interfac es ? Whic h topic holds the mos t for M onty P ython fans ?
2.18.2. The Code
Save the following c ode as a C G I s c ript ["H ow to Run the
H ac ks " in the P refac e] named gootopic.cgi in the cgi-bin direc tory
on your web s erver:
#!/usr/local/bin/perl
# gootopic.cgi
# Queries across Google Topics (and All of Google), returning
# number of results and top result for each topic.
# gootopic.cgi is called as a CGI with form input
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Google Topics
my %topics = (
'' => 'All of Google',
unclesam => 'U.S. Government',
linux => 'Linux',
mac => 'Macintosh',
bsd => 'FreeBSD'
);
use strict;
use SOAP::Lite;
use CGI qw/:standard *table/;
# Display the query form.
print
header( ),
start_html("GooTopic"),
h1("GooTopic"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'), ' ',
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Perform the queries, one for each topic area.
if (param('query')) {
print
start_table({-cellpadding=>'10', -border=>'1'}),
Tr([th({-align=>'left'}, ['Topic', 'Count', 'Top Result'])]);
foreach my $topic (keys %topics) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), 0, 10, "false", $topic, "fal
"", "latin1", "latin1"
);
my $result_count = $results->{'estimatedTotalResultsCount'};
my $top_result = 'no results';
if ( $result_count ) {
my $t = @{$results->{'resultElements'}}[0];
$top_result =
b($t->{title}||'no title') . br( ) .
a({href=>$t->{URL}, $t->{URL}}) . br( ) .
i($t->{snippet}||'no snippet');
}
# Output
print Tr([ td([
$topics{$topic},
$result_count,
$top_result
])
]);
print
end_table( ),
print end_html( );
Be s ure to replac e insert key here with your G oogle A P I key.
2.18.3. Running the Hack
P oint your web brows er at gootopic.cgi.
P rovide a query and the s c ript will s earc h for your query in eac h
s pec ial topic area, providing you with an overall ("A ll of G oogle")
c ount, topic area c ount, and the top res ult for eac h. Figure 2 - 1 1
s hows a s ample run for " user interface", with M ac intos h
(s urpris ingly) not c oming out on top.
Figure 2-11. Topic search for "user interface"
2.18.4. Search Ideas
Trying to figure out how many pages eac h topic finds for
partic ular top- level domains (e.g., .com, .edu, .uk) is rather
interes ting. You c an query for inurl:xx site:xx, where xx is the
top- level domain you're interes ted in. For example, inurl:va
site:va s earc hes for any of the Vatic an's pages in the various
topic s ; there aren't any. inurl:mil site:mil finds an
overwhelming number of res ults in the U .S. G overnment s pec ial
topic n o s urpris e there.
I f you are in the mood for a party game, try to find the weirdes t
pos s ible s earc hes that appear in all the s pec ial topic s .
Hack 37. Find the Largest Page
We all know about Feeling Lucky with Google. But how about
Feeling Large?
G oogle s orts your s earc h res ults by P ageRank. C ertainly makes
s ens e. Sometimes , however, you may have a s ubs tantially
different foc us in mind and want things ordered in s ome other
manner. Rec enc y is one that c omes to mind. Size is another.
I n the s ame manner that G oogle's "I 'm Feeling L uc ky" button
redirec ts you to the s earc h res ult with the highes t P ageRank,
this hac k s ends you direc tly to the larges t (in kilobytes ).
T his hac k works rather nic ely in
c ombination with repetition [Hack #15] .
2.19.1. The Code
Save the following c ode as a C G I s c ript ["H ow to Run the
H ac ks " in the P refac e] named goolarge.cgi in your web s erver's
cgi-bin direc tory. Be s ure to replac e insert key here with your
G oogle A P I key.
#!/usr/local/bin/perl
# goolarge.cgi
# A take-off on "I'm Feeling Lucky," redirects the browser to the
# (size in K) document found in the first n results. n is set by
# of loops x 10 results per.
# goolarge.cgi is called as a CGI with form input
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
use strict;
use SOAP::Lite;
use CGI qw/:standard/;
# Display the query form.
unless (param('query')) {
print
header( ),
start_html("GooLarge"),
h1("GooLarge"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
' ',
submit(-name=>'submit', -value=>"I'm Feeling Large"),
end_form( ), p( );
}
# Run the query.
else {
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my($largest_size, $largest_url);
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset,
10, "false", "", "false", "", "latin1", "latin1"
);
@{$results->{'resultElements'}} or print p('No results'), last
# Keep track of the largest size and its associated URL.
foreach (@{$results->{'resultElements'}}) {
substr($_->{cachedSize}, 0, -1) > $largest_size and
($largest_size, $largest_url) =
(substr($_->{cachedSize}, 0, -1), $_->{URL});
# Redirect the browser to the largest result.
print redirect $largest_url;
2.19.2. Running the Hack
P oint your web brows er at the goolarge.cgi C G I s c ript. E nter a
query and c lic k the "I 'm Feeling L arge" button. You'll be
trans ported direc tly to the larges t page matc hing your
querywithin the firs t s pec ified number of res ults (the default is
1 0 0 res ults : 1 0 loops of 1 0 res ults apiec e), that is .
2.19.3. Usage Examples
P erhaps you're looking for bibliographic information of a famous
pers on. You might find that a regular G oogle s earc h does n't net
you any more than a mention on a plethora of c ontent- light web
pages . Running the s ame query through this hac k s ometimes
turns up pages with extens ive bibliographies .
M aybe you're looking for information about a s tate. Try queries
for the s tate name along with related information, s uc h as motto,
c apitol, or s tate bird.
2.19.4. Hacking the Hack
T his hac k is n't s o muc h hac ked as tweaked. By c hanging the
value as s igned to the $loops variable in my $loops = 10;, you c an
alter the number of res ults that the s c ript c hec ks before
redirec ting you to the larges t res ult. Remember, the maximum
number of res ults is the number of loops multiplied by 1 0 res ults
per loop. T he default of 1 0 c ons iders the top 1 0 0 res ults . A
$loops value of 5 would c ons ider only the top 5 0 ; 2 0 , the top
2 0 0 ; and s o forth.
Hack 38. Perform Proximity Searches
GA PS perf orms a proximity check between two words.
Sometimes it would be advantageous to s earc h both forward and
bac kward. For example, if you're doing genealogy res earc h, you
might find your unc le J ohn Smith as both "J ohn Smith" or "Smith
J ohn." Similarly, s ome pages might inc lude J ohn's middle
initial" J ohn Q Smith" or "Smith J ohn Q ."
I f all you're after is query permutations ,
[Hack #28] might do the tric k.
You might als o need to find c onc epts that exis t near eac h other
but don't make up a phras e. For example, you might want to
learn about keeping s quirrels out of your bird feeder. Various
attempts to c reate a phras e bas ed on this idea might not work,
but jus t s earc hing for s everal words might not find s pec ific
enough res ults .
G A P S, c reated by Kevin Shay, allows you to run s earc hes both
forward and bac kward and within a c ertain number of s pac es of
eac h other. G A P S s tands for G oogle A P I P roximity Searc h, and
that's exac tly what this applic ation is : a way to s earc h for topic s
within a few words of eac h other without having to run s everal
queries in a row. T he program runs the queries and automatic ally
organizes the res ults .
You enter two terms (there is an option to add more terms that
will not be s earc hed for in proximity) and s pec ify how far apart
you want them (1 , 2 , or 3 words ). You c an s pec ify that the words
be found only in the order you reques t (wordA , wordB) or in either
order (wordA , wordB, and wordB, wordA ). You c an s pec ify how
many res ults you want and in what order they appear (s orted by
title, U RL , ranking, and proximity).
Searc h res ults are formatted muc h like regular G oogle res ults ,
only a dis tanc e ranking is inc luded bes ide eac h title. T he
dis tanc e ranking, between one and three, s pec ifies how far apart
the two query words were on the page. Figure 2 - 1 2 s hows a
G A P S s earc h for google and hacks within two words of one
another, order intac t.
Figure 2-12. GAPS search for "google" and
"hacks" within two words of one another
C lic k the dis tanc e rating link to pas s the generated query on to
G oogle direc tly.
2.20.1. Making the Most of GAPS
G A P S works bes t when you have words on the s ame page that
are ambiguous ly or not at all related to one another. For example,
if you're looking for information on G oogle and s earc h engine
optimization (SE O ), you might find that s earc hing for the words
G oogle and SE O does n't find the res ults that you want, while
us ing G A P S to s earc h for the words G oogle and SE O within three
words of eac h other finds material foc us ed muc h more on s earc h
engine optimization for G oogle.
G A P S als o works well when you're s earc hing for information
about two famous people who might often appear on the s ame
page, though not nec es s arily in proximity to eac h other. For
example, you might want information on Bill C linton and A lan
G reens pan, but might find that you're getting too many pages
that happen to lis t the two of them. By s earc hing for their names
in proximity to eac h other, you'll get better res ults .
Finally, you might find G A P S us eful in medic al res earc h. M any
times your s earc h res ults will inc lude index pages that lis t
s everal s ymptoms . H owever, inc luding s ymptoms or other
medic al terms within a few words of eac h other c an help you find
more relevant res ults . N ote that this tec hnique will take s ome
experimentation. M any pages about medic al c onditions c ontain
long lis ts of s ymptoms and effec ts , and there's no reas on that
one s ymptom might be within a few words of another.
2.20.2. The Code
T he G A P S s ourc e c ode is rather lengthy, s o we're not making it
available here. You c an, however, get it online at
http://www.s taggernation.c om/gaps /readme.html.
2.20.3. See Also
I f you like G A P S, you might want to try a c ouple of other s c ripts
from Staggernation:
G A WSH (http://www.s taggernation.c om/gaws h)
Stands for G oogle A P I Web Searc h by H os t. T his
program allows you to enter a query and get a lis t of
domains that c ontain information on that query. I f you
c lic k on the triangle bes ide any domain name, you'll get
a lis t of pages in that domain that matc h your query.
T his program us es D H T M L , whic h means that it'll only
work with I nternet E xplorer or M ozilla/N ets c ape.
G A RBO (http://www.s taggernation.c om/garbo)
Stands for G oogle A P I Relation Brows ing O utliner. L ike
G A WSH , this program us es D H T M L , s o it'll work only
with M ozilla/N ets c ape and I nternet E xplorer. When you
enter a U RL , G A RBO will do a s earc h for either pages
that link to the U RL you s pec ify or pages related to that
U RL . Run a s earc h and you'll get a lis t of U RL s with
triangles bes ide them. I f you c lic k on a triangle, you'll
get a lis t of pages that either link to the U RL you c hos e
or are related to the U RL you c hos e, depending on what
you c hos e in the initial query.
Hack 39. Meander Your Google
Neighborhood
Google Neighborhood attempts to detangle the Web by building
a "neighborhood" of sites around a URL.
I t's c alled the World Wide Web, not the World Wide Straight L ine.
Sites link to other s ites , building a web of s ites . A nd what a
tangled web we weave.
G oogle N eighborhood by the P ython- wis e M ark P ilgrim
(http://diveintomark.org) attempts to detangle s ome s mall
portion of the Web by us ing the G oogle A P I to find s ites related
to a U RL that you provide, s c raping the links on the s ites
returned and building a "neighborhood" of s ites that link both the
original U RL and eac h other.
I f you'd like to give this hac k a whirl without having to run it
yours elf, there's a live vers ion available at
http://diveintomark.org/arc hives /2 0 0 2 /0 6 /0 4 /who_are_the_peopl
T he s ourc e c ode (inc luded in the following s ec tion) for G oogle
N eighborhood is available for download from
http://diveintomark.org/projec ts /mis c /neighbor.py.txt.
2.21.1. The Code
G oogle N eighborhood is written in the P ython
(http://www.python.org) programming language. Your s ys tem will
need to have P ython ins talled for you to run this hac k.
"""Blogroll finder and aggregator"""
__copyright_ _ = "Copyright 2002, Mark Pilgrim"
__license_ _ = "Python"
try:
import timeoutsocket # http://www.timo-tasi.org/python/timeout
timeoutsocket.setDefaultSocketTimeout(10)
except:
pass
import urllib, urlparse, os, time, operator, sys, pickle, re, cgi,
from sgmllib import SGMLParser
from threading import *
BUFFERSIZE = 1024
IGNOREEXTS = ('.xml', '.opml', '.rss', '.rdf', '.pdf', '.doc')
INCLUDEEXTS = ('', '.html', '.htm', '.shtml', '.php', '.asp', '.js
IGNOREDOMAINS = ('cgi.alexa.com', 'adserver1.backbeatmedia.com', '
'freshmeat.net', 'readroom.ipl.org', 'amazon.com', 'ringsurf.com')
def prettyURL(url):
protocol, domain, path, params, query, fragment = urlparse.url
if path == '/':
path = ''
return urlparse.urlunparse(('', domain, path, '', '', '')).rep
def simplifyURL(url):
url = url.replace('www.', '')
url = url.replace('/coming.html', '/')
protocol, domain, path, params, query, fragment = urlparse.url
if path == '':
url = url + '/'
return url
class MinimalURLOpener(urllib.FancyURLopener):
def __init_ _(self, *args):
apply(urllib.FancyURLopener.__init_ _, (self,) + args)
self.addheaders = [('User-agent', '')]
def http_error_401(self, url, fp, errcode, errmsg, headers, da
pass
class BlogrollParser(SGMLParser):
def __init_ _(self, url):
SGMLParser.__init_ _(self)
self.url = url
self.reset( )
def reset(self):
SGMLParser.reset(self)
self.possible = []
self.blogroll = []
self.ina = 0
def _goodlink(self, href):
protocol, domain, path, params, query, fragment = urlparse
if protocol.lower( ) <> 'http': return 0
if self.url.find(domain) <> -1: return 0
if domain in IGNOREDOMAINS: return 0
if domain.find(':5335') <> -1: return 0
if domain.find('.google') <> -1: return 0
if fragment: return 0
shortpath, ext = os.path.splitext(path)
ext = ext.lower( )
if ext in INCLUDEEXTS: return 1
if ext.lower( ) in IGNOREEXTS: return 0
# more rules here?
return 1
def _confirmpossibles(self):
if len(self.possible) >= 4:
for url in self.possible:
if url not in self.blogroll:
self.blogroll.append(url)
self.possible = []
def start_a(self, attrs):
self.ina = 1
hreflist = [e[1] for e in attrs if e[0]=='href']
if not hreflist: return
href = simplifyURL(hreflist[0])
if self._goodlink(href):
self.possible.append(href)
def end_a(self):
self.ina = 0
def handle_data(self, data):
if self.ina: return
if data.strip( ):
self._confirmpossibles( )
def end_html(self, attrs):
self.confirmpossibles( )
def getRadioBlogroll(url):
try:
usock = MinimalURLOpener( ).open('%s/gems/mySubscriptions.
opmlSource = usock.read( )
usock.close( )
except:
return []
if opmlSource.find('<opml') == -1: return []
radioBlogroll = []
start = 0
while 1:
p = opmlSource.find('htmlUrl="', start)
if p == -1: break
refurl = opmlSource[p:p+100].split('"')[1]
radioBlogroll.append(refurl)
start = p + len(refurl) + 10
return radioBlogroll
def getBlogroll(url):
if url[:7] <> 'http://':
url = 'http://' + url
radioBlogroll = getRadioBlogroll(url)
if radioBlogroll:
return radioBlogroll
parser = BlogrollParser(url)
try:
usock = MinimalURLOpener( ).open(url)
htmlSource = usock.read( )
usock.close( )
except:
return []
parser.feed(htmlSource)
return parser.blogroll
class BlogrollThread(Thread):
def __init_ _(self, master, url):
Thread.__init_ _(self)
self.master = master
self.url = url
def run(self):
self.master.callback(self.url, getBlogroll(self.url))
class BlogrollThreadMaster:
def __init_ _(self, url, recurse):
self.blogrollDict = {}
self.done = 0
if type(url)==type(''):
blogroll = getBlogroll(url)
else:
blogroll = url
self.run(blogroll, recurse)
def callback(self, url, blogroll):
if not self.done:
self.blogrollDict[url] = blogroll
def run(self, blogroll, recurse):
start = 0
end = 5
while 1:
threads = []
for url in blogroll[start:end]:
if not self.blogrollDict.has_key(url):
t = BlogrollThread(self, url)
threads.append(t)
for t in threads:
t.start( )
time.sleep(0.000001)
for t in threads:
time.sleep(0.000001)
t.join(10)
start += 5
end += 5
if start > len(blogroll): break
if recurse > 1:
masterlist = reduce(operator.add, self.blogrollDict.va
newlist = [url for url in masterlist if not self.blogr
self.run(newlist, recurse - 1)
else:
self.done = 1
def sortBlogrollData(blogrollDict):
sortD = {}
for blogroll in blogrollDict.values( ):
for url in blogroll:
sortD[url] = sortD.setdefault(url, 0) + 1
sortI = [(v, k) for k, v in sortD.items( )]
sortI.sort( )
sortI.reverse( )
return sortI
def trimdata(sortI, cutoff):
return [(c, url) for c, url in sortI if c >= cutoff]
def getRelated(url):
import google
results = []
start = 0
for i in range(3):
data = google.doGoogleSearch('related:%s' % url, start)
results.extend([oneResult.URL for oneResult in data.result
start += 10
if len(data.results) < 10: break
return results
def getNeighborhood(baseURL):
relatedList = getRelated(baseURL)
blogrollDict = BlogrollThreadMaster(relatedList, 1).blogrollDi
neighborhood = sortBlogrollData(blogrollDict)
neighborhood = trimdata(neighborhood, 2)
neighborhood = [(c,url, prettyURL(url)) for c,url in neighborh
return neighborhood
def render_html(baseURL, data):
output = []
output.append("""
<table class="socialnetwork" summary="neighborhood for %s">
<caption>Neighborhood for %s</caption>
<thead>
<tr>
<th scope="col">Name</th>
<th scope="col">Links</th>
<th shope="col">Explore</th>
</tr>
</thead>
<tbody>""" % (cgi.escape(prettyURL(baseURL)), cgi.escape(prettyURL
for c, url, title in data:
output.append("""<tr><td><a href="%s">%s</a></td>
<td>%s</td><td><a href="%s">explore</a></td></tr
>""" % (url, title, c, 'http://diveintomark.org/cgi-bin/neighborho
cgi.escape(url)))
output.append("""
</tbody>
</table>""")
return "".join(output)
def render_rss(baseURL, data):
title = prettyURL(baseURL)
channeltitle = "%s neighborhood" % title
localtime = time.strftime('%Y-%m-%dT%H:%M:%S-05:00', time.loca
output = []
output.append("""<?xml version="1.0"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="
elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndicati
"http://webns.net/mvcb/">
<channel rdf:about="%(baseURL)s">
<title>%(channeltitle)s</title>
<link>%(baseURL)s</link>
<description>Sites in the virtual neighborhood of %(title)s</descr
<language>en-us</language>
<lastBuildDate>%(localtime)s</lastBuildDate>
<pubDate>%(localtime)s</pubDate>
<admin:generatorAgent rdf:resource="http://divintomark.org/cgi-bin
?v=1.1" />
<admin:errorReportsTo rdf:resource="mailto:[email protected]"/
<sy:updatePeriod>weekly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
<items>
<rdf:Seq>
""" % locals( ))
##"""
for c, url, title in data:
output.append("""<rdf:li rdf:resource="%s" />
""" % url)
output.append("""</rdf:Seq>
</items>
</channel>
""")
for c, url, title in data:
output.append("""<item rdf:about="%(url)s">
<title>%(title)s</title>
<link>%(url)s</link>
<description>%(c)s links</description>
</item>
""" % locals( ))
output.append("""</rdf:RDF>""")
return "".join(output)
if __name__ == '__main_ _':
print render_html(getNeighborhood(sys.argv[1]))
You'll als o need an H T M L form to c all the neighborhood.cgi s c ript.
H ere's a s imple one:
<form action="/cgi-bin/neighborhood.cgi" method="get">
URL: <input name="url" type="text" />
<br />
Output as: <input name="fl" type="radio" value="html" checked="tru
<input name="fl" type="radio" value="rss" checked="true" /> RSS
<br />
<input type="submit" value="Meander" />
</form>
Save the form as neighborhood.html, being s ure to alter the
action= to point at the loc ation in whic h you ins talled the C G I
s c ript ["H ow to Run the Sc ripts " in the P refac e].
2.21.2. Running the Hack
P oint your brows er at the loc ation of the form you s aved jus t a
moment ago. P rovide it with the U RL that you're interes ted in
us ing as the c enter, s elec t H T M L or RSS output, and hit the
M eander button.
Figure 2 - 1 3 s hows a repres entation of Rael's (raelity.org's , to be
prec is e) G oogle N eighborhood. C lic king on any of the links on
the left trans ports you to the U RL s hown. M ore interes tingly, the
"explore" link s hifts your point of view, c entering the
neighborhood on the as s oc iated U RL . You c an thus meander a
neighborhood to your heart's c ontent; don't be s urpris ed,
es pec ially in the blogging world, if you keep c oming ac ros s the
s ame links . Speaking of links , the number lis ted beneath the
"L inks " heading repres ents the number of links the as s oc iated
s ite has to the c urrently foc us ed s ite.
Figure 2-13. raelity.org's Google Neighborhood
2.21.3. Hacking the Hack
I f you want to hac k this hac k, c onc entrate your efforts on a
s mall bloc k of c ode, s pec ifying what file extens ions you want to
inc lude and exc lude, as well as what domains you want to
exc lude when c alc ulating your neighborhoods :
IGNOREEXTS = ('.xml', '.opml', '.rss', '.rdf', '.pdf', '.doc')
INCLUDEEXTS = ('', '.html', '.htm', '.shtml', '.php', '.asp', '.js
IGNOREDOMAINS = ('cgi.alexa.com', 'adserver1.backbeatmedia.com', '
slashdot.org','freshmeat.net', 'readroom.ipl.org', 'amazon.com',
'ringsurf.com')
2.21.3.1 Noticing/ignoring file extensions
T he way the hac k is c urrently written, the neighborhood is built
around pretty s tandard files . H owever, you c ould c reate a
neighborhood of s ites s erved by P H P (http://www.php.net),
inc luding only U RL s with a P H P (.php) extens ion. O r perhaps
your interes t lies in Word doc uments and P D F files . You'd alter
the c ode as follows :
IGNOREEXTS = ('.xml', '.opml', '.rss', '.rdf', '.html', '.htm', '.
'.php', '.asp', '.jsp')
INCLUDEEXTS = ('',
'.pdf', '.doc'
2.21.3.2 Ignoring domains
Sometimes , when building a neighborhood, you might notic e that
the s ame links are popping up again and again. T hey're not
really part of the neighborhood but tend to be plac es that the
web pages making up your neighborhood often link to. For
example, mos t Blogger- bas ed weblogs inc lude a link to
http://www.blogger.c om as a matter of c ours e.
E xc lude domains that hold no interes t to you by adding them to
the IGNOREDOMAINS lis t:
IGNOREDOMAINS = ('cgi.alexa.com', 'adserver1.backbeatmedia.com',
'ask.slashdot.org', 'freshmeat.net', 'readroom.ipl.org', 'amazon.c
'ringsurf.com', 'blogger.com')
Hack 40. Run a Google Popularity
Contest
Put two terms, spelling variations, animals, vegetables, or
minerals head to head in a Google-based popularity contest.
Whic h is the mos t popular word? Whic h s pelling is more
c ommonly us ed? Who gets more mentions , Fred or E thel M ertz?
T hes e and other equally c ritic al ques tions are ans wered by
G oogle Smac kdown
(http://www.onfoc us .c om/googles mac k/down.as p).
G oogle Smac kdown was written by P aul
Baus c h (http://www.onfoc us .c om/).
Why would you want to c ompare s earc h c ounts ? Sometimes
finding out whic h terms appear more often c an help you develop
your queries better. Why us e a partic ular word if it gets almos t no
res ults ? C omparing mis s pellings c an provide leads on hard- to-
find terms or phras es . A nd s ometimes it's jus t fun to run a
popularity c ontes t.
I f you're jus t s earc hing for keywords , G oogle Smac kdown is very
s imple. E nter one word in eac h query box, a G oogle Web A P I
developer's key [C hapter 9 ] if you have one, and c lic k the "throw
down! " button. Smac kdown will return the winner and
approximate c ount of eac h s earc h.
I f you're planning to us e a s pec ial s yntax, you'll have to be more
c areful. U nfortunately, the link: s yntax does n't work.
I nteres tingly, phonebook: does ; do more people named Smith or
J ones live in Bos ton, M A ?
To us e any s pec ial s yntaxes , enc los e the query in quotes :
" intitle:windows".
T he next tip is a little bac kwards . I f you want to s pec ify a
phras e, do not us e quotes ; Smac kdown, by default, s earc hes for
a phras e. I f you want to s earc h for the two words on one page but
not nec es s arily as a phras e (jolly and roger vers us "jolly roger"),
do us e quotes . T he reas on the s pec ial s yntaxes and phras es
work this way is bec aus e the program automatic ally enc los es
phras es in quotes , and if you add quotes , you're s ending a
double quoted query to G oogle (" "Google""). When G oogle runs
into a double quote like that, it jus t s trips out all the quotes .
I f you'd like to try a G oogle Smac kdown without
having to run it yours elf, there's a live vers ion
available at:
http://www.onfoc us .c om/googles mac k/down.as p.
2.22.1. The Code
G oogle Smac kdown is written for A SP pages running under the
Windows operating s ys tem and M ic ros oft I nternet I nformation
Server (I I S):
<%
'-----------------------------------------------------------
' Set the global variable strGoogleKey.
'-----------------------------------------------------------
Dim strGoogleKey
strGoogleKey = "you rkey goes here. "
'-----------------------------------------------------------
' The function GetResult( ) is the heart of Google Smackdown.
' It queries Google with a given word or phrase and returns
' the estimated total search results for that word or phrase.
' By running this function twice with the two words the user
' enters into the form, we have our Smackdown.
'-----------------------------------------------------------
Function GetResult(term)
'-----------------------------------------------------------
' Set the variable the contains the SOAP request. A SOAP
' software package will generate a similar request to this
' one behind the scenes, but the query for this application
' is very simple so it can be set "by hand."
'-----------------------------------------------------------
strRequest = "<?xml version='1.0' encoding='UTF-8'?>" & Chr(13) &
& Chr(13) & Chr(10)
strRequest = strRequest & "<SOAP-ENV:Envelope xmlns:SOAP-ENV=""ht
xmlsoap.org/soap/envelope/"" xmlns:xsi=""http://www.w3.org/1999/XM
xmlns:xsd=""http://www.w3.org/1999/XMLSchema"">" & Chr(13) & Chr(1
strRequest = strRequest & " <SOAP-ENV:Body>" & Chr(13) & Chr(10)
strRequest = strRequest & " <ns1:doGoogleSearch xmlns:ns1=""urn:G
SOAP-ENV:encodingStyle=""http://schemas.xmlsoap.org/soap/encoding/
& Chr(10)
strRequest = strRequest & " <key xsi:type=""xsd:string"">" & str
& "</key>" & Chr(13) & Chr(10)
strRequest = strRequest & " <q xsi:type=""xsd:string"">""" & ter
"""</q>" & Chr(13) & Chr(10)
strRequest = strRequest & " <start xsi:type=""xsd:int"">0</start
& Chr(13) & Chr(10)
strRequest = strRequest & " <maxResults xsi:type=""xsd:int"">1</
maxResults>" & Chr(13) & Chr(10)
strRequest = strRequest & " <filter xsi:type=""xsd:boolean"">tru
filter>" & Chr(13) & Chr(10)
strRequest = strRequest & " <restrict xsi:type=""xsd:string""></
restrict>" & Chr(13) & Chr(10)
strRequest = strRequest & " <safeSearch xsi:type=""xsd:boolean""
safeSearch>" & Chr(13) & Chr(10)
strRequest = strRequest & " <lr xsi:type=""xsd:string""></lr>" &
Chr(13) & Chr(10)
strRequest = strRequest & " <ie xsi:type=""xsd:string"">latin1</
& Chr(13) & Chr(10)
strRequest = strRequest & " <oe xsi:type=""xsd:string"">latin1</
& Chr(13) & Chr(10)
strRequest = strRequest & " </ns1:doGoogleSearch>" & Chr(13) & Ch
strRequest = strRequest & " </SOAP-ENV:Body>" & Chr(13) & Chr(10)
strRequest = strRequest & "</SOAP-ENV:Envelope>" & Chr(13) & Chr(
'-----------------------------------------------------------
' The variable strRequest is now set to the SOAP request.
' Now it's sent to Google via HTTP using the Microsoft
' ServerXMLHTTP component.
'
' Create the object...
'-----------------------------------------------------------
Set xmlhttp = Server.CreateObject("MSXML2.ServerXMLHTTP")
'-----------------------------------------------------------
' Set the variable strURL equal to the URL for Google Web
' Services.
'-----------------------------------------------------------
strURL = "http://api.google.com/search/beta2"
'-----------------------------------------------------------
' Set the object to open the specified URL as an HTTP POST.
'-----------------------------------------------------------
xmlhttp.Open "POST", strURL, false
'-----------------------------------------------------------
' Set the Content-Type header for the request equal to
' "text/xml" so the server knows we're sending XML.
'-----------------------------------------------------------
xmlhttp.setRequestHeader "Content-Type", "text/xml"
'-----------------------------------------------------------
' Send the XML request created earlier to Google via HTTP.
'-----------------------------------------------------------
xmlhttp.Send(strRequest)
'-----------------------------------------------------------
' Set the object AllItems equal to the XML that Google sends
' back.
'-----------------------------------------------------------
Set AllItems = xmlhttp.responseXML
'-----------------------------------------------------------
' If the parser hit an error--usually due to malformed XML,
' write the error reason to the user. And stop the script.
' Google doesn't send malformed XML, so this code shouldn't
' run.
'-----------------------------------------------------------
If AllItems.parseError.ErrorCode <> 0 Then
response.write "Error: " & AllItems.parseError.reason
response.end
End If
'-----------------------------------------------------------
' Release the ServerXMLHTTP object now that it's no longer
' needed--to free the memory space it was using.
'-----------------------------------------------------------
Set xmlhttp = Nothing
'-----------------------------------------------------------
' Look for <faultstring> element in the XML the google has
' returned. If it exists, Google is letting us know that
' something has gone wrong with the request.
'-----------------------------------------------------------
Set oError = AllItems.selectNodes("//faultstring")
If oError.length > 0 Then
Set oErrorText = AllItems.selectSingleNode("//faultstring")
GetResult = "Error: " & oErrorText.text
Exit Function
End If
'-----------------------------------------------------------
' This is what we're after: the <estimatedTotalResultsCount>
' element in the XML that Google has returned.
'-----------------------------------------------------------
Set oTotal = AllItems.selectSingleNode("//estimatedTotalResultsCo
GetResult = oTotal.text
Set oTotal = Nothing
End Function
'-----------------------------------------------------------
' Begin the HTML page. This portion of the page is the same
' for both the initial form and results.
'-----------------------------------------------------------
%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Google Smackdown</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8
<script language="JavaScript">
// This client-side JavaScript function validates user input.
// If the form fields are empty when the user clicks "submit"
// this will stop the submit action, and prompt the user to
// enter some information.
function checkForm( ) {
var f = document.frmGSmack
if ((f.text1.value == '') || (f.text1.value == ' ')) {
alert('Please enter the first word or phrase.')
return false;
if ((f.text2.value == '') || (f.text2.value == ' ')) {
alert('Please enter the second word or phrase.')
return false;
return true;
}
</script>
</head>
<body>
<h1>Google Smackdown</h1>
This queries Google via its API and receives the estimated total r
phrase.
<%
'-----------------------------------------------------------
' If the form request items "text1" and "text2" are not
' empty, then the form has been submitted to this page.
'
' It's time to call the GetResult( ) function and see which
' word or phrase wins the Smackdown.
'-----------------------------------------------------------
If request("text1") <> "" AND request("text2") <> "" Then
'-----------------------------------------------------------
' Send the word from the first form field to GetResult( ),
' and it will return the estimated total results.
'-----------------------------------------------------------
intResult1 = GetResult(request("text1"))
'-----------------------------------------------------------
' Check to make sure the first result is an integer. If not,
' Google has returned an error message and the script will
' move on.
'-----------------------------------------------------------
If isNumeric(intResult1) Then
intResult2 = GetResult(request("text2"))
End If
'-----------------------------------------------------------
' Check to make sure the second result is also an integer.
' If they're both numeric, the script can display the
' results.
'-----------------------------------------------------------
If isNumeric(intResult1) AND isNumeric(intResult2) Then
intResult1 = CDbl(intResult1)
intResult2 = CDbl(intResult2)
'-----------------------------------------------------------
' Begin writing the results to the page...
'-----------------------------------------------------------
response.write "<h2>The Results</h2>"
response.write "And the undisputed champion is...<br>"
response.write "<ol>"
'-----------------------------------------------------------
' Compare the two results to determine which should be
' displayed first.
'-----------------------------------------------------------
If intResult1 > intResult2 Then
response.write "<li>" & request("text1") & " (<a target=""_blan
href=""http://www.google.com/search?hl=en&ie=UTF8&oe=UTF8&q=" & Se
URLEncode("""" & request("text1") & """") & """>" & FormatNumber
(intResult1,0) & "</a>)<br>"
response.write "<li>" & request("text2") & " (<a target=""_blan
href=""http://www.google.com/search?hl=en&ie=UTF8&oe=UTF8&q=" & Se
URLEncode("""" & request("text2") & """") & """>" & FormatNumber
(intResult2,0) & "</a>)<br>"
Else
response.write "<li>" & request("text2") & " (<a target=""_blan
href=""http://www.google.com/search?hl=en&ie=UTF8&oe=UTF8&q=" &
Server.URLEncode("""" & request("text2") & """") & """>" & FormatN
(intResult2,0) & "</a>)<br>"
response.write "<li>" & request("text1") & " (<a target=""_blan
href=""http://www.google.com/search?hl=en&ie=UTF8&oe=UTF8&q=" & Se
URLEncode("""" & request("text1") & """") & """>" & FormatNumber
(intResult1,0) & "</a>)<br>"
End If
'-----------------------------------------------------------
' Finish writing the results to the page and include a link
' to the page for another round.
'-----------------------------------------------------------
response.write "</ol>"
response.write "<a href=""smackdown.asp"">Another Challenge?</a>
response.write "<br>"
Else
'-----------------------------------------------------------
' One or both of the results are not numeric. We can assume
' this is because the developer's key has reached its
' 1,000 query limit for the day. Because the script has
' made it to this point, the SOAP response did not return
' an error. If it had, GetResult( ) would have stopped the
' script.
'-----------------------------------------------------------
intResult1 = Replace(intResult1,"key " & strGoogleKey,"key")
intResult2 = Replace(intResult2,"key " & strGoogleKey,"key")
'-----------------------------------------------------------
' Write out the error to the user...
'-----------------------------------------------------------
response.write "<h2>It Didn't Work, Error</h2>"
'-----------------------------------------------------------
' If the results are the same, we don't need to write out
' both of them.
'-----------------------------------------------------------
If intResult1 = intResult2 Then
response.write intResult1 & "<br><br>"
Else
response.write intResult1 & "<br><br>" & intResult2 &
"<br><br>"
End If
'-----------------------------------------------------------
' A link to the script for another round.
'-----------------------------------------------------------
response.write "<a href=""smackdown.asp"">Another Challenge?</a>
response.write "<br>"
End If
Else
'-----------------------------------------------------------
' The form request items "text1" and "text2" are empty,
' which means the form has not been submitted to the page
' yet.
'-----------------------------------------------------------
%>
<h2>The Arena</h2>
<div class="clsPost">The setting is the most impressive search eng
<a href="http://www.google.com/">Google</a>. As a test of its <a h
"http://www.google.com/apis">API</a>, two words or phrases will go
in a terabyte tug-of-war. Which one appears in more pages across t
<h2>The Challengers</h2>
You choose the warring words...
<br><br>
<form name="frmGSmack" action="smackdown.asp" method="post" onSubm
checkForm( );">
<table>
<tr>
<td align="right">word/phrase 1</td> <td><input type="text" name
"text1"></td>
</tr>
<tr>
<td align="right">word/phrase 2</td> <td><input type="text
" name="text2"></td>
</tr>
<tr>
<td> </td><td><input type="submit" value="throw down!
"></td>
</tr>
</table>
</form>
<%
End If
'-----------------------------------------------------------
' This is the end of the If statement that checks to see
' if the form has been submitted. Both states of the page
' get the closing tags below.
'-----------------------------------------------------------
%>
</body>
</html>
2.22.2. Running the Hack
T he hac k is run in exac tly the s ame manner as the live vers ion
of G oogle Smac kdown
(http://www.onfoc us .c om/googles mac k/down.as p) running on
O nfoc us .c om. P oint your web brows er at it and fill out the form.
Figure 2 - 1 4 s hows a s ample Smac kdown between negative
feelings about M ac intos h vers us Windows .
Figure 2-14. Macintosh/Windows Google
Smackdown
Hack 41. Scrape Yahoo! Buzz for a
Google Search
A proof -of -concept hack scrapes the buzziest items f rom Yahoo!
Buzz and submits them to a Google search.
N o web s ite is an is land. Billions of hyperlinks link to billions of
doc uments . Sometimes , however, you want to take information
from one s ite and apply it to another s ite.
U nles s that s ite has a web s ervic e A P I like G oogle's , your bes t
bet is s c raping. Sc raping is where you us e an automated
program to remove s pec ific bits of information from a web page.
E xamples of the s orts of elements people s c rape inc lude s toc k
quotes , news headlines , pric es , and s o forth. You name it and
s omeone's probably s c raped it.
T here's s ome c ontrovers y about s c raping. Some s ites don't
mind it, while others c an't s tand it. I f you dec ide to s c rape a
s ite, do it gently; take the minimum amount of information you
need and, whatever you do, don't hog the s c rapee's bandwidth.
So, what are we s c raping?
G oogle has a query popularity page c alled G oogle Zeitgeis t
(http://www.google.c om/pres s /zeitgeis t.html). U nfortunately, the
Zeitgeis t is updated only onc e a week and c ontains only a
limited amount of s c rapable data. T hat's where Yahoo! Buzz
(http://buzz.yahoo.c om) c omes in. T he s ite is ric h with
c ons tantly updated information. I ts Buzz I ndex keeps tabs on
what's hot in popular c ulture: c elebs , games , movies , televis ion
s hows , mus ic , and more.
T his hac k grabs the buzzies t of the buzz, the top of the
L eaderboard, and s earc hes G oogle for all it knows on the
s ubjec t. A nd to keep things c urrent, only pages indexed by
G oogle within the pas t few days [Hack #16] are c ons idered.
T his hac k requires additional P erl
modules : Time::JulianDay
(http://s earc h.c pan.org/s earc h?
query=T ime% 3 A % 3 A J ulianD ay) and
LWP::Simple
(http://s earc h.c pan.org/s earc h?
query=LWP % 3 A % 3 A Simple). I t won't run
without them.
2.23.1. The Code
Save the following c ode to a plain text file named buzzgle.pl:
#!/usr/local/bin/perl
# buzzgle.pl
# Pull the top item from the Yahoo! Buzz Index and query the last
# three day's worth of Google's index for it.
# Usage: perl buzzgle.pl
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of days back to go in the Google index.
my $days_back = 3;
use strict;
use SOAP::Lite;
use LWP::Simple;
use Time::JulianDay;
# Scrape the top item from the Yahoo! Buzz Index.
# Grab a copy of http://buzz.yahoo.com.
my $buzz_content = get("http://buzz.yahoo.com/")
or die "Couldn't grab the Yahoo Buzz: $!";
# Find the first item on the Buzz Index list.
my($buzziest) = $buzz_content =~ m!http://search.yahoo.com/search
die "Couldn't figure out the Yahoo! buzz\n" unless $buzziest;
# Figure out today's Julian date.
my $today = int local_julian_day(time);
# Build the Google query.
my $query = "\"$buzziest\" daterange:" . ($today - $days_back) . "
print
"The buzziest item on Yahoo Buzz today is: $buzziest\n",
"Querying Google for: $query\n",
"Results:\n\n";
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Query Google.
my $results = $google_search ->
doGoogleSearch(
$google_key, $query, 0, 10, "false", "", "false",
"", "latin1", "latin1"
);
# No results?
@{$results->{resultElements}} or die "No results";
# Loop through the results.
foreach my $result (@{$results->{'resultElements'}}) {
my $output =
join "\n",
$result->{title} || "no title",
$result->{URL},
$result->{snippet} || 'no snippet',
"\n";
$output =~ s!<.+?>!!g; # drop all HTML tags
print $output;
2.23.2. Running the Hack
T he s c ript runs from the c ommand line ["H ow to Run the H ac ks "
in the P refac e] without need of arguments of any kind. P robably
the bes t thing to do is to direc t the output to a pager (a
c ommand- line applic ation that allows you to page through long
output, us ually by hitting the s pac ebar), like s o:
% perl buzzgle.pl | more
O r you c an direc t the output to a file for later perus al:
% perl buzzgle.pl > buzzgle.txt
A s with all s c raping applic ations , this c ode is fragile, s ubjec t to
breakage if (read: when) H T M L formatting of the Yahoo! Buzz
page c hanges . I f you find you have to adjus t to matc h Yahoo! 's
formatting, you'll have to alter the regular expres s ion matc h as
appropriate:
my($buzziest) = $buzz_content =~ m!http://search.yahoo.com/search
Regular expres s ions and general H T M L
s c raping are beyond the s c ope of this
book. For more information, I s ugges t you
c ons ult O 'Reilly's Perl and LWP
(http://www.oreilly.c om/c atalog/perllwp) or
Mas tering Regular Expres s ions
(http://www.oreilly.c om/c atalog/regex).
2.23.3. The Results
A t the time of this writing, M aria Sharapova, the Rus s ian tennis
s tar, is all the rage:
% perl buzzgle.pl | less
The buzziest item on Yahoo Buzz today is: Maria Sharapova
Querying Google for: "Maria Sharapova" daterange:2453292-2453295
Results:
Maria Sharapova
http://www.mariaworld.net/
everything about Maria Sharapova: photos, interviews, articles, st
much more! ... Maria Sharapova: 2004 Tokyo Champion! ...
Maria Sharapova
http://www.mariaworld.net/photos.htm
everything about Maria Sharapova: photos, interviews, articles, st
much more! HOME, BIOGRAPHY, PHOTOS, RESULTS, ...
Maria Sharapova Picture Page
http://milano.vinden.nl/
Maria Sharapova Picture Page. Country: Russia. Date of Birth: Apri
Birth: Nyagan, Russia. Residence: Bradenton, Florida USA. Height:
2.23.4. Hacking the Hack
H ere are s ome ideas for hac king the hac k:
A s it s tands , the program returns 1 0 res ults . You c ould
c hange that to one res ult and immediately open that
res ult ins tead of returning a lis t. Bravo, you've jus t
written I 'm Feeling P opular, as in G oogle's I 'm Feeling
L uc ky.
T his vers ion of the program s earc hes the las t three days
of indexed pages . Bec aus e there's a s light lag in
indexing news s tories , I would index at leas t the las t two
days ' worth of indexed pages , but you c ould extend it to
s even days or even a month. Simply c hange my
$days_back = 3;, altering the value of the $days_back
variable.
You c ould c reate a "Buzz E ffec t" hac k by running the
Yahoo! Buzz query with and without the date range
limitation. H ow do the res ults c hange between a full
s earc h and a s earc h of the las t few days ?
Yahoo! 's Buzz has s everal different s ec tions . T his one
looks at the Buzz s ummary, but you c ould c reate other
ones bas ed on Yahoo! 's other buzz c harts (televis ion,
http://buzz.yahoo.c om/televis ion/, for ins tanc e).
Hack 42. Compare Google's Results with
Other Search Engines
Compare Google search results with results f rom other search
engines.
True G oogle fanatic s might not like to think s o, but there's really
more than one s earc h engine. G oogle's c ompetitors inc lude the
likes of Teoma and Yahoo! .
E qually s urpris ing to the average G oogle fanatic is the fac t that
G oogle does n't index the entire Web. T here are, at the time of
this writing, over eight billion web pages in the G oogle index, but
that's jus t a frac tion of the Web. You'd be amazed how muc h
nonoverlapping c ontent there is in eac h s earc h engine. Some
queries that bring only a few res ults on one s earc h engine bring
plenty on another s earc h engine.
T his hac k gives you a program that c ompares c ounts for G oogle
and s everal other s earc h engines , with an eas y way to plug in
new s earc h engines that you want to inc lude. T his vers ion of the
hac k s earc hes different domains for the query, in addition to
getting the full c ount for the query its elf.
T his hac k requires the LWP::Simple
(http://s earc h.c pan.org/s earc h?
query=LWP % 3 A % 3 A Simple) module to
run.
2.24.1. The Code
Save the following c ode as a C G I s c ript ["H ow to Run the
H ac ks " in the P refac e] named google_compare.cgi in your web
s ite's cgi-bin direc tory:
#!/usr/local/bin/perl
# google_compare.cgi
# Compares Google results against those of other search engines.
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
use LWP::Simple qw(get);
use CGI qw{:standard};
my $googleSearch = SOAP::Lite->service("file:$google_wdsl");
# Set up our browser output.
print "Content-type: text/html\n\n";
print "<html><title>Google Compare Results</title><body>\n";
# Ask and we shell receive.
my $query = param('query');
unless ($query) {
print "<h1>No query defined.</h1></body></html>\n\n";
exit; # If there's no query there's no program.
# Spit out the original before we encode.
print "<h1>Your original query was '$query'.</h1>\n";
$query =~ s/\s/\+/g ; #changing the spaces to + signs
$query =~ s/\"/%22/g; #changing the quotes to %22
# Create some hashes of queries for various search engines.
# We have four types of queries ("plain", "com", "edu", and "org")
# and three search engines ("Google", "AlltheWeb", and "Altavista"
# Each engine has a name, query, and regular expression used to
# scrape the results.
my $query_hash = {
plain => {
Google => { name => "Google", query => $query, },
AlltheWeb => {
name => "AlltheWeb",
regexp => '<span class="ofSoMany">(.*)</span>',
query => "http://www.alltheweb.com/search?cat=web&q=$que
},
Altavista => {
name => "Altavista",
regexp => 'AltaVista found (.*) results',
query => "http://www.altavista.com/sites/search/web?q=$qu
},
com => {
Google => { name => "Google", query => "$query site:com", },
AlltheWeb => {
name => "AlltheWeb",
regexp => '<span class="ofSoMany">(.*)</span>',
query => "http://www.alltheweb.com/ search?cat=web&q=$qu
},
Altavista => {
name => "Altavista",
regexp => 'AltaVista found (.*) results',
query => "http://www.altavista.com/sites/search/web?q=$qu
},
org => {
Google => { name => "Google", query => "$query site:org", },
AlltheWeb => {
name => "AlltheWeb",
regexp => '<span class="ofSoMany">(.*)</span>',
query => "http://www.alltheweb.com/
search?cat=web&q=$query+domain%3Aorg",
},
Altavista => {
name => "Altavista",
regexp => 'AltaVista found (.*) results',
query => "http://www.altavista.com/sites/search/web?q=$qu
}
},
net => {
Google => { name => "Google", query => "$query site:net", },
AlltheWeb => {
name => "AlltheWeb",
regexp => '<span class="ofSoMany">(.*)</span>',
query => "http://www.alltheweb.com/search?cat=web&q=$que
},
Altavista => {
name => "Altavista",
regexp => 'AltaVista found (.*) results',
query => "http://www.altavista.com/sites/search/web?q=$qu
};
# Now we loop through each of our query types
# under the assumption there's a matching
# hash that contains our engines and string.
foreach my $query_type (keys (%$query_hash)) {
print "<h2>Results for a '$query_type' search:</h2>\n";
# Now, loop through each engine we have and get/print the resul
foreach my $engine (values %{$query_hash->{$query_type}}) {
my $results_count;
# If this is Google, we use the API and not port 80.
if ($engine->{name} eq "Google") {
my $result = $googleSearch->doGoogleSearch(
$google_key, $engine->{query}, 0, 1,
"false", "", "false", "", "latin1", "latin1");
$results_count = $result->{estimatedTotalResultsCount};
# The Google API doesn't format numbers with commas.
my $rresults_count = reverse $results_count;
$rresults_count =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
$results_count = scalar reverse $rresults_count;
# It's not Google, so we GET like everyone else.
elsif ($engine->{name} ne "Google") {
my $data = get($engine->{query}) or print "ERROR: $!";
$data =~ /$engine->{regexp}/; $results_count = $1 || 0;
# and print out the results.
print "<strong>$engine->{name}</strong>: $results_count<br /
2.24.2. Running the Hack
T his hac k runs as a C G I s c ript, c alled from your web brows er as
google_compare.cgi?query=your query keywords.
2.24.3. Why?
You might be wondering why you would want to c ompare res ult
c ounts ac ros s s earc h engines . I t's a good idea to follow what
different s earc h engines offer in terms of res ults . While you
might find that a phras e on one s earc h engine provides only a
few res ults , another engine might return res ults aplenty. I t
makes s ens e to s pend your time and energy us ing the latter for
the res earc h at hand.
Tara Calis hain and Kevin Hemenway
Hack 43. Scattersearch with Yahoo! and
Google
Sometimes, illuminating results can be f ound when scraping
f rom one site and f eeding the results into the A PI of another.
With scattersearching, you can narrow down the most popular
related results, as suggested by Yahoo! and Google.
We've c ombined a s c rape of a Yahoo! web page with a G oogle
s earc h [Hack #41], blending s c raped data with data generated
via a web s ervic e A P I to good effec t. I n this hac k, we're doing
s omething s imilar, exc ept this time we're taking the res ults of a
Yahoo! s earc h and blending it with a G oogle s earc h.
Yahoo! has a "Related s earc hes " feature, where you enter a
s earc h term and get a lis t of related terms under the s earc h box,
if any are available. T his hac k s c rapes thos e related terms and
performs a G oogle s earc h for the related terms in the title. I t
then returns the c ount for thos e s earc hes , along with a direc t
link to the res ults . A s ide from s howing how s c raped and A P I -
generated data c an live together in harmony, this hac k is good to
us e when you're exploring c onc epts ; for example, you might
know that s omething c alled Pokemon exis ts , but you might not
know anything about it. You'll get Yahoo! 's related s earc hes and
an idea of how many res ults eac h of thos e s earc hes generates in
G oogle. From there, you c an c hoos e the s earc h terms that
generate the mos t res ults or look the mos t promis ing bas ed on
your limited knowledge, or you c an s imply pic k a road that
appears les s traveled.
2.25.1. The Code
Save the following c ode to a file c alled s catters earch.pl.
Bear in mind that this hac k, while us ing the
G oogle A P I for the G oogle portion,
involves s ome s c raping of Yahoo! 's s earc h
pages and thus is rather brittle. I f it s tops
working at any point, take a gander at the
regular expres s ions for they're almos t
s ure to be the breakage point.
#!/usr/bin/perl -w
# Scattersearch -- Use the search suggestions from
# Yahoo! to build a series of intitle: searches at Google.
use strict;
use LWP;
use SOAP::Lite;
use CGI qw/:standard/;
# Get our query, else die miserably.
my $query = shift @ARGV; die unless $query;
# Your Google API developer's key.
my $google_key = 'insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Search Yahoo! for the query.
my $ua = LWP::UserAgent->new;
my $url = URI->new('http://search.yahoo.com/search');
$url->query_form(rs => "more", p => $query);
my $yahoosearch = $ua->get($url)->content;
$yahoosearch =~ s/[\f\t\n\r]//isg;
# And determine if there were any results.
$yahoosearch =~ m!Also try:(.*?) !migs;
die "Sorry, there were no results!\n" unless $1;
my $recommended = $1;
# Now, add all our results into
# an array for Google processing.
my @googlequeries;
while ($recommended =~ m!<a href=".*?">(.*?)</a>!mgis) {
my $searchitem = $1;
$searchitem =~ s/nobr|<[^>]*>|\///g;
print "$searchitem\n";
push (@googlequeries, $searchitem);
# Print our header for the results page.
print join "\n",
start_html("ScatterSearch");
h1("Your Scattersearch Results"),
p("Your original search term was '$query'"),
p("That search had " . scalar(@googlequeries). " recommended
p("Here are result numbers from a Google search"),
CGI::start_ol( );
# Create our Google object for API searches.
my $gsrch = SOAP::Lite->service("file:$google_wdsl");
# Running the actual Google queries.
foreach my $googlesearch (@googlequeries) {
my $titlesearch = "allintitle:$googlesearch";
my $count = $gsrch->doGoogleSearch($google_key, $titlesearch,
0, 1, "false", "", "false"
"", "", "");
my $url = $googlesearch; $url =~ s/ /+/g; $url =~ s/\"/%22/g;
print li("There were $count->{estimatedTotalResultsCount} ".
"results for the recommended search <a href=\"http://
"google.com/search?q=$url&num=100\">$googlesearch</a>
print CGI::end_ol( ), end_html;
2.25.2. Running the Hack
T his s c ript generates an H T M L file, ready for you to upload to a
public ly ac c es s ible web s ite. I f you want to s ave the output of a
s earc h for siamese to a file c alled s catters earch.html in your Sites
direc tory, run the following c ommand ["H ow to Run the H ac ks " in
the P refac e]:
% perl scattersearch.pl "siamese" > ~/Sites/scattersearch.html
Your final res ults , as rendered by your brows er, will look s imilar
to Figure 2 - 1 5 .
Figure 2-15. Scattersearch results for siamese
You'll have to do a little experimenting to find out whic h terms
have related s earc hes . Broadly s peaking, very general s earc h
terms are bad; it's better to zero in on terms that people would
s earc h for and that would be eas y to group together. A t the time
of this writing, for example, heart has no related s earc h terms ,
but blood pressure does .
2.25.3. Hacking the Hack
You have two c hoic es : you c an either hac k the interac tion with
Yahoo! or expand it to inc lude s omething in addition to or
ins tead of Yahoo! its elf. L et's look at Yahoo! firs t. I f you take a
c los e look at the c ode, you'll s ee we're pas s ing an unus ual
parameter to our Yahoo! s earc h res ults page:
$url->query_form(rs => "more", p => $query);
T he rs=>"more" part of the s earc h s hows the related s earc h
terms . G etting the related s earc h this way will s how up to 1 0
res ults . I f you remove that portion of the c ode, you'll get roughly
four related s earc hes when they're available. T hat might s uit you
if you want only a few, but perhaps you want dozens and dozens !
I n that c as e, replac e more with all.
Beware, though: this c an generate a lot of related s earc hes , and
it c an c ertainly eat up your daily allowanc e of G oogle A P I
reques ts . Tread c arefully.
Kevin Hemenway and Tara Calis hain
Hack 44. Yahoo! Directory Mindshare in
Google
How does link popularity compare in Yahoo!'s searchable subject
index versus Google's f ull-text index? Find out by calculating
mindshare!
Yahoo! and G oogle are two very different animals . Yahoo!
indexes only a s ite's main U RL , title, and des c ription, while
G oogle builds full- text indexes of entire s ites . Surely there's
s ome interes ting c ros s - pollination when you c ombine res ults
from the two.
T his hac k s c rapes all the U RL s in a s pec ified s ubc ategory of the
Yahoo! direc tory. I t then takes eac h U RL and gets its link c ount
from G oogle. E ac h link c ount provides a nic e s naps hot of how a
partic ular Yahoo! c ategory and its lis ted s ites s tac k up on the
popularity s c ale.
What's a link count? I t's s imply the total
number of pages in G oogle's index that
link to a s pec ific U RL .
T here are a c ouple of ways you c an us e your knowledge of a
s ubc ategory's link c ount. I f you find a s ubc ategory whos e U RL s
have only a few links eac h in G oogle, you may have found a
s ubc ategory that is n't getting a lot of attention from Yahoo! 's
editors . C ons ider going els ewhere for your res earc h. I f you're a
webmas ter and you're c ons idering paying to have Yahoo! add
you to their direc tory, run this hac k on the c ategory in whic h you
want to be lis ted. A re mos t of the links really popular? I f they
are, are you s ure your s ite will s tand out and get c lic ks ? M aybe
you s hould c hoos e a different c ategory.
We got this idea from a s imilar experiment J on U dell
(http://weblog.infoworld.c om/udell/) did in 2 0 0 1 . H e us ed
A ltaV is ta ins tead of G oogle; s ee
http://udell.roninhous e.c om/download/minds hare- s c ript.txt. We
apprec iate the ins piration, J on!
2.26.1. The Code
You will need a G oogle A P I ac c ount (http://api.google.c om), as
well as the SOAP::Lite (http://www.s oaplite.c om) and
HTML::LinkExtor (http://s earc h.c pan.org/author/G A A S/H T M L-
P ars er/lib/H T M L /L inkE xtor.pm) P erl modules to run this hac k.
Save the c ode as minds hare_calculator.pl, remembering to
replac e insert key here with your G oogle A P I key:
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
use HTML::LinkExtor;
use SOAP::Lite;
my $google_key = 'insert key here';
my $google_wdsl = "GoogleSearch.wsdl";
my $yahoo_dir = shift || "/Computers_and_Internet/Data_Formats/X
"eXtensible_Markup_Language_/RSS/News_Aggregator
# Download the Yahoo! directory.
my $data = get("http://dir.yahoo.com" . $yahoo_dir) or die $!;
# Create our Google object.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my %urls; # where we keep our counts and titles.
# Extract all the links and parse 'em.
HTML::LinkExtor->new(\&mindshare)->parse($data);
sub mindshare { # for each link we find...
my ($tag, %attr) = @_;
# Continue on only if the tag was a link,
# and the URL matches Yahoo!'s redirectory.
return if $tag ne 'a';
return unless $attr{href} =~ /rds.yahoo/;
return unless $attr{href} =~ /\*http/;
# Now get our real URL.
$attr{href} =~ /\*(http.*)/; my $url = $1;
$url =~ s/%3A/:/; # turn encoding into legits.
# And process each URL through Google.
my $results = $google_search->doGoogleSearch(
$google_key, "link:$url", 0, 1,
"true", "", "false", "", "", ""
); # wheee, that was easy, guvner.
$urls{$url} = $results->{estimatedTotalResultsCount};
# Now sort and display.
my @sorted_urls = sort { $urls{$b} <=> $urls{$a} } keys %urls;
foreach my $url (@sorted_urls) { print "$urls{$url}: $url\n"; }
2.26.2. Running the Hack
T he hac k has its only c onfigurationt he Yahoo! direc tory you're
interes ted inp as s ed as a s ingle argument (in quotes ) on the
c ommand line ["H ow to Run the Sc ripts " in the P refac e]. I f you
don't pas s one of your own, a default direc tory will be us ed
ins tead.
% perl mindshare_calculator.pl "/Entertainment/Humor/Procrastinati
Your res ults s how the U RL s in thos e direc tories , s orted by total
G oogle links :
340: http://www.p45.net/
246: http://www.ishouldbeworking.com/
81: http://www.india.com/
33: http://www.jlc.net/~useless/
23: http://www.geocities.com/SouthBeach/1915/
18: http://www.eskimo.com/~spban/creed.html
13: http://www.black-schaffer.org/scp/
3: http://www.angelfire.com/mi/psociety
2: http://www.geocities.com/wastingstatetime/
2.26.3. Hacking the Hack
Yahoo! is n't the only s earc hable s ubjec t index out there, of
c ours e. T here's als o the O pen D irec tory P rojec t (D M O Z,
http://www.dmoz.org), whic h is the produc t of thous ands of
volunteers bus ily c ataloging and c ategorizing s ites on the
Webt he web c ommunity's Yahoo! , if you will. T his hac k works
jus t as well on D M O Z as it does on Yahoo! ; they're very s imilar
in s truc ture.
Replac e the default Yahoo! direc tory with its D M O Z equivalent:
my $dmoz_dir = shift || "/Reference/Libraries/Library_and_Informat
Science/Technical_Services/Cataloguing/Metadata/RDF/Applications/R
News_Readers/";
You'll als o need to c hange the download ins truc tions :
# Download the Dmoz.org directory.
my $data = get("http://dmoz.org" . $dmoz_dir) or die $!;
N ext, replac e the lines that c hec k whether a U RL s hould be
meas ured for minds hare. When we were s c raping Yahoo! in our
original s c ript, all direc tory entries were always prepended with
http://s rd.yahoo.c om/ and then the U RL its elf. T hus , to ens ure
we rec eived a proper U RL , we s kipped over the link unles s it
matc hed that c riteria:
return unless $attr{href} =~ /srd.yahoo/;
return unless $attr{href} =~ /\*http/;
Sinc e D M O Z is an entirely different s ite, our c hec ks for validity
have to c hange. D M O Z does n't modify the outgoing U RL , s o our
previous Yahoo! c hec ks have no relevanc e here. I ns tead, we'll
make s ure it's a full- blooded loc ation (i.e., it s tarts with http://)
and it does n't matc h any of D M O Z's internal page links .
L ikewis e, we'll ignore s earc hes on other engines :
return unless $attr{href} =~ /^http/;
return if $attr{href} =~ /dmoz|google|altavista|lycos|yahoo|allthe
O ur las t c hange is to modify the bit of c ode that gets the real
U RL from Yahoo! 's modified vers ion. I ns tead of "finding the U RL
within the U RL ":
# Now get our real URL.
$attr{href} =~ /\*(http.*)/; my $url = $1;
we s imply as s ign the U RL that HTML::LinkExtor has found:
# Now get our real URL.
my $url = $attr{href};
C an you go even further with this ? Sure! You might want to
s earc h a more s pec ialized direc tory, s uc h as the Fis hH oo!
fis hing s earc h engine (http://www.fis hhoo.c om).
You might want to return only the mos t linked- to U RL from the
direc tory, whic h is quite eas y, by piping the res ults ["H ow to Run
the H ac ks " in the P refac e] to another c ommon U nix utility:
% perl mindshare_calculator.pl | head 1
A lternatively, you might want to go ahead and grab the top 1 0
G oogle matc hes for the U RL that has the mos t minds hare. To do
s o, add the following c ode to the bottom of the s c ript:
print "\nMost popular URLs for the strongest mindshare:\n";
my $most_popular = shift @sorted_urls;
my $results = $google_search->doGoogleSearch(
$google_key, "$most_popular", 0, 10,
"true", "", "false", "", "", "" );
foreach my $element (@{$results->{resultElements}}) {
next if $element->{URL} eq $most_popular;
print " * $element->{URL}\n";
print " \"$element->{title}\"\n\n";
T hen, run the s c ript as us ual (the output here us es the default
hardc oded direc tory).
% perl mindshare_calculator.pl
27800: http://radio.userland.com/
6670: http://www.oreillynet.com/meerkat/
5460: http://www.newsisfree.com/
3280: http://ranchero.com/software/netnewswire/
1840: http://www.disobey.com/amphetadesk/
847: http://www.feedreader.com/
797: http://www.serence.com/site.php?page=prod_klipfolio
674: http://bitworking.org/Aggie.html
492: http://www.newzcrawler.com/
387: http://www.sharpreader.net/
112: http://www.awasu.com/
102: http://www.bloglines.com/
67: http://www.blueelephantsoftware.com/
57: http://www.blogtrack.com/
50: http://www.proggle.com/novobot/
Most popular URLs for the strongest mindshare:
* http://groups.yahoo.com/group/radio-userland/
"Yahoo! Groups : radio-userland"
* http://groups.yahoo.com/group/radio-userland-francophone/messag
"Yahoo! Groupes : radio-userland-francophone Messages : Message
* http://www.fuzzygroup.com/writing/radiouserland_faq.htm
"Fuzzygroup :: Radio UserLand FAQ"
...
Kevin Hemenway and Tara Calis hain
Hack 45. Glean Weblog-Free Google
Results
With so many weblogs being indexed by Google, you might
worry about too much emphasis on the hot topic of the moment.
In this hack, we'll show you how to remove the weblog f actor
f rom your Google results.
Weblogs t hos e frequently updated, link- heavy pers onal pages a re
quite the fas hionable thing thes e days . T here are at leas t
4 ,0 0 0 ,0 0 0 ac tive weblogs ac ros s the I nternet, c overing almos t
every pos s ible s ubjec t and interes t. For humans , they're good
reading, but for s earc h engines , they're heavenly bundles of
fres h c ontent and links galore.
Some people think that the s earc h engine's delight in weblogs
s lants s earc h res ults by plac ing too muc h emphas is on too
s mall a group of rec ent rather than evergreen c ontent. A s I
write, for example, I am the twelfth mos t important Ben on the
I nternet, ac c ording to G oogle. T his rank c omes s olely from my
weblog's popularity.
T his hac k s earc hes G oogle, dis c arding any res ults c oming from
weblogs . I t us es the G oogle Web Servic es A P I
(http://api.google.c om) and the A P I of Tec hnorati
(http://www.tec hnorati.c om/members ), an exc ellent interfac e to
D avid Sifry's weblog data- trac king tool. Both A P I s require keys ,
available from the U RL s mentioned.
Finally, you'll need a s imple H T M L page with a form that pas s es
a text query to the parameter q (the query that will run on
G oogle), s omething like this :
<form action="googletech.cgi" method="POST">
Your query: <input type="text" name="q">
<input type="submit" name="Search!" value="Search!">
</form>
Save the form as googletec h.html.
2.27.1. The Code
Save the following c ode ["H ow to Run the H ac ks " in the P refac e]
to a file c alled googletech.cgi.
You'll need the XML::Simple and SOAP::Lite
P erl modules to run this hac k.
#!/usr/bin/perl -w
# googletech.cgi
# Getting Google results
# without getting weblog results.
use strict;
use SOAP::Lite;
use XML::Simple;
use CGI qw(:standard);
use HTML::Entities ( );
use LWP::Simple qw(!head);
my $technoratikey = "insert technorati key here";
my $googlekey = "insert google key here";
# Set up the query term
# from the CGI input.
my $query = param("q");
# Initialize the SOAP interface and run the Google search.
my $google_wdsl = "http://api.google.com/GoogleSearch.wsdl";
my $service = SOAP::Lite->service->($google_wdsl);
# Start returning the results page;
# do this now to prevent timeouts.
my $cgi = new CGI;
print $cgi->header( );
print $cgi->start_html(-title=>'Blog Free Google Results');
print $cgi->h1('Blog Free Results for '. "$query");
print $cgi->start_ul( );
# Go through each of the results.
foreach my $element (@{$result->{'resultElements'}}) {
my $url = HTML::Entities::encode($element->{'URL'});
# Request the Technorati information for each result.
my $technorati_result = get("http://api.technorati.com/bloginf
"url=$url&key=$technoratikey");
# Parse this information.
my $parser = new XML::Simple;
my $parsed_feed = $parser->XMLin($technorati_result);
# If Technorati considers this site to be a weblog,
# go onto the next result. If not, display it, and then go on.
if ($parsed_feed->{document}{result}{weblog}{name}) { next; }
else {
print $cgi-> i('<a href="'.$url.'">'.$element->{title}.'</
print $cgi-> l("$element->{snippet}");
print $cgi -> end_ul( );
print $cgi->end_html;
L et's s tep through the meaningful bits of this c ode. Firs t c omes
pulling in the query from G oogle. N otic e the 10 in the
doGoogleSearch; this is the number of s earc h res ults reques ted
from G oogle. You s hould try to s et this as high as G oogle will
allow whenever you run the s c ript; otherwis e, you might find that
s earc hing for terms that are extremely popular in the weblogging
world does not return any res ults at all, having been rejec ted as
originating from a blog.
Sinc e we're about to make a web s ervic es c all for every one of
the returned res ults , whic h might take a while, we want to s tart
returning the res ults page now; this helps prevent c onnec tion
timeouts . A s s uc h, we s pit out a header us ing the CGI module,
and then jump into our loop.
We then get to the final part of our c ode: ac tually looping through
the s earc h res ults returned by G oogle and pas s ing the H T M L-
enc oded U RL to the Tec hnorati A P I as a get reques t. Tec hnorati
will then return its res ults as an XM L doc ument.
Be c areful that you do not run out of
Tec hnorati reques ts . A s I write this ,
Tec hnorati is offering 5 0 0 free reques ts a
day, whic h, with this s c ript, is around 5 0
s earc hes . I f you make this s c ript available
to your web s ite audienc e, you will s oon
run out of Tec hnorati reques ts . O ne
pos s ible workaround is forc ing the us er to
enter her own Tec hnorati key. You c an get
the us er's key from the s ame form that
ac c epts the query. See the "H ac king the
H ac k" s ec tion for a means of doing this .
P ars ing this res ult is a matter of pas s ing it through XML::Simple.
Sinc e Tec hnorati returns only an XM L c ons truc t c ontaining name
when the s ite is thought to be a weblog, we c an us e the pres enc e
of this c ons truc t as a marker. I f the program s ees the c ons truc t,
it s kips to the next res ult. I f it does n't, the s ite is not thought to
be a weblog by Tec hnorati and we dis play a link to it, along with
the title and s nippet (when available) returned by G oogle.
2.27.2. Running the Hack
P oint your brows er at the form googletech.html.
2.27.3. Hacking the Hack
A s mentioned previous ly, this s c ript c an burn through your
Tec hnorati allowanc es rather quic kly under heavy us e. T he
s imples t way of s olving this is to forc e the end us er to s upply his
own Tec hnorati key. Firs t, add a new input to your H T M L form for
the us er's key:
Your query: <input type="text" name="key">
T hen, s uc k in the us er's key as a replac ement to your own:
# Set up the query term
# from the CGI input.
my $query = param("q");
$technoratikey = param("key");
Ben Hammers ley
Hack 46. Spot Trends with Geotargeting
Compare the relative popularity of a trend or f ashion in
dif f erent locations, using only Google and Directi search results.
O ne of the lates t buzzwords on the I nternet is geotargeting,
whic h is jus t a fanc y name for the proc es s of matc hing
hos tnames (e.g., http://www.oreilly.c om) to addres s es (e.g.,
2 0 8 .2 0 1 .2 3 9 .3 6 ) to c ountry names (e.g., U .S.). T he whole thing
works bec aus e there are people who c ompile s uc h databas es
and make them readily available. T his information mus t be
c ompiled by hand or at leas t s emiautomatic ally bec aus e the
D N S s ys tem that res olves hos tnames to addres s es does not
s tore it in its dis tributed databas e.
While it is pos s ible to add geographic loc ation data to D N S
rec ords , it is highly imprac tic al to do s o. H owever, s inc e we know
whic h addres s es have been as s igned to whic h bus ines s es ,
governments , organizations , or educ ational es tablis hments , we
c an as s ume with a high probability that the geographic loc ation
of the ins titution matc hes that of its hos ts , at leas t for mos t of
them. For example, if the given addres s belongs to the range of
addres s es as s igned to Britis h Telec om, then it is highly probable
that it is us ed by a hos t loc ated within the territory of the U nited
Kingdom.
Why go to s uc h lengths when a s imple D N S lookup (e.g., nslookup
208.201.239.36) gives the name of the hos t, and in that name we
c an look up the top- level domain (e.g., .pl, .de, or .uk) to find out
where this partic ular hos t is loc ated? T here are four good
reas ons for this :
N ot all lookups on addres s es return hos tnames .
A s ingle addres s might s erve more than one virtual hos t.
Some c ountry domains are regis tered by foreigners and
hos ted on s ervers on the other s ide of the globe.
.com, .net, .org, .biz, or .info domains tell us nothing about
the geographic loc ation of the s ervers they are hos ted
on. T hat's where geotargeting c an help.
G eotargeting is by no means perfec t. For example, if an
international organization s uc h as A O L gets a large c hunk of
addres s es that it us es not only for s ervers in the U .S., but als o
in E urope, the E uropean hos ts might be reported as being bas ed
in the U .S. Fortunately, s uc h aberrations do not c ons titute a
large perc entage of addres s es .
T he firs t us ers of geotargeting were advertis ers , who thought it
would be a neat idea to s erve loc al advertis ing. I n other words , if
a us er vis its a New York Times s ite, the ads they s ee depend on
their phys ic al loc ation. T hos e in the U .S. might s ee the ads for
the lates t C hrys ler c ar, while thos e in J apan might s ee ads for i-
mode; us ers from P oland might s ee ads for "E ks tradyc ja" (a c ult
P olis h polic e T V s eries ), and thos e in I ndia might s ee ads for
the lates t Bollywood movie. While s uc h us e of geotargeting
might be us ed to maximize the return on the inves ted dollar, it
als o goes agains t the idea behind the I nternet, whic h is a global
network. (I n other words , if you are entering a global audienc e,
don't try to hide from it by c ompartmentalizing it.) A nother
problem with geotargeted ads is that they follow the viewer.
A dvertis ers mus t love it, but it is annoying to the us er; how
would you feel if you s aw the s ame ads for your loc al burger bar
everywhere you went in the world?
A nother applic ation of geotargeting is to s erve c ontent in the
loc al language. T he idea is really nic e, but it's often poorly
implemented and takes a lot of c lic king to get to the pages in
other languages . T he loc al pages have a habit of returning out of
nowhere, es pec ially after you upgrade your web brows er. A muc h
more interes ting applic ation of geotargeting is the analys is of
trends , whic h is us ually done in two ways : analys is of s erver
logs and via analys is of res ults of querying G oogle.
Server log analys is is us ed to determine the geographic loc ation
of your vis itors . For example, you might dis c over that your
c ompany's s ite is being vis ited by a large number of people from
J apan. P erhaps that number is s o s ignific ant that it would jus tify
the rollout of a J apanes e vers ion of your s ite. O r it might be a
s ignal that your c ompany's produc ts are bec oming popular in
that c ountry and you s hould s pend more marketing dollars there.
But if you run a s erver for U .S. expatriates living in Tokyo, the
s ame information might mean that your s ite is growing in
popularity and you need to add more information in E nglis h. T his
method is bas ed on the lis t of addres s es of hos ts that c onnec t
to the s erver, s tored in your s erver's ac c es s log. You c ould write
a s c ript that looks up their geographic loc ation to find out where
your vis itors c ome from. I t is more ac c urate than looking up top-
level domains , although it's a little s lower due to the number of
D N S lookups that need to be done.
A nother interes ting us e of geotargeting is analys is of the s pread
of trends . T his c an be done with a s imple s c ript that plugs into
the G oogle A P I and the I P - to- C ountry databas e provided by
D irec ti (http://ip- to- c ountry.direc ti.c om). T he idea behind trend
analys is is s imple: perform repetitive queries us ing the s ame
keywords , but c hange the language of res ults and top- level
domains for eac h query. C ompare the number of res ults returned
for eac h language, and you will get a good idea of the s pread of
the analyzed trend ac ros s c ultures . T hen, c ompare the number
of res ults returned for eac h top- level domain, and you will get a
good idea of the s pread of the analyzed trend ac ros s the globe.
Finally, look up geographic loc ations of hos ts to better
approximate the geographic s pread of the analyzed trend.
You might dis c over s ome interes ting things this way: it c ould
turn out that a partic ular .com domain that s erves a s ignific ant
number of doc uments and that c ontained the given query in
J apanes e is loc ated in G ermany. I t might be a s ign that there is
a large J apanes e c ommunity in G ermany that us es that
partic ular .com domain for their portal. Shouldn't you be trying to
get in touc h with them?
T he geos pider.pl s c ript s hown in this hac k is a s ample
implementation of this idea. I t queries G oogle and then matc hes
the names of hos ts in returned U RL s agains t the I P - to- C ountry
databas e.
2.28.1. The Code
Save the following c ode ["H ow to Run the H ac ks " in the P refac e]
as geos pider.pl.
You will need the Getopt::Std and
Net::Google modules for this s c ript. You'll
als o need a G oogle A P I key
(http://api.google.c om) and the lates t ip-
to-country.cs v databas e (http://ip- to-
c ountry.webhos ting.info/downloads /ip- to-
c ountry.c s v.zip).
#!/usr/bin/perl-w
# geospider.pl
# Geotargeting spider -- queries Google through the Google API, ex
# hostnames from returned URLs, looks up addresses of hosts, and m
# addresses of hosts against the IP-to-Country database from Direc
# ip-to-country.directi.com. For more information about this softw
# http://www.artymiak.com/software or contact [email protected].
# This code is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#
use strict;
use Getopt::Std;
use Net::Google;
use constant GOOGLEKEY => 'insert key here';
use Socket;
my $help = <<"EOH";
------------------------------------------------------------------
Geotargeting trend analysis spider
------------------------------------------------------------------
Options:
-h prints this help
-q query in utf8, e.g. 'Spidering Hacks'
-l language codes, e.g. 'en fr jp'
-d domains, e.g. '.com'
-s which result should be returned first (count starts from 0
-n how many results should be returned, e.g. 700
------------------------------------------------------------------
EOH
# Define our arguments and show the
# help if asked, or if missing query.
my %args; getopts("hq:l:d:s:n:", \%args);
die $help if exists $args{h};
die $help unless $args{'q'};
# Create the Google object.
my $google = Net::Google->new(key=>GOOGLEKEY);
my $search = $google->search( );
# Language, defaulting to English.
$search->lr(qw($args{l}) || "en");
# What search result to start at, defaulting to 0.
$search->starts_at($args{'s'} || 0);
# How many results, defaulting to 10.
$search->starts_at($args{'n'} || 10);
# Input and output encoding.
$search->ie(qw(utf8)); $search->oe(qw(utf8));
my $querystr; # our final string for searching.
if ($args{d}) { $querystr = "$args{q} .site:$args{d}"; }
else { $querystr = $args{'q'} } # domain specific searching.
# Load in our lookup list from
# http://ip-to-country.directi.com/.
my $file = "ip-to-country.csv";
print STDERR "Trying to open $file... \n";
open (FILE, "<$file") or die "[error] Couldn't open $file: $!\n";
# Now load the whole shebang into memory.
print STDERR "Database opened, loading... \n";
my (%ip_from, %ip_to, %code2, %code3, %country);
my $counter=0; while (<FILE>) {
chomp; my $line = $_; $line =~ s/"//g; # strip all quotes.
my ($ip_from, $ip_to, $code2, $code3, $country) = split(/,/, $
# Remove trailing zeros.
$ip_from =~ s/^0{0,10}//g;
$ip_to =~ s/^0{0,10}//g;
# And assign to our permanents.
$ip_from{$counter} = $ip_from;
$ip_to{$counter} = $ip_to;
$code2{$counter} = $code2;
$code3{$counter} = $code3;
$country{$counter} = $country;
$counter++; # move on to next line.
$search->query(qq($querystr));
print STDERR "Querying Google with $querystr... \n";
print STDERR "Processing results from Google... \n";
# For each result from Google, display
# the geographic information we've found.
foreach my $result (@{$search->response( )}) {
print "-" x 80 . "\n";
print " Search time: " . $result->searchTime( ) . "s\n";
print " Query: $querystr\n";
print " Languages: " . ( $args{l} || "en" ) . "\n";
print " Domain: " . ( $args{d} || "" ) . "\n";
print " Start at: " . ( $args{'s'} || 0 ) . "\n";
print "Return items: " . ( $args{n} || 10 ) . "\n";
print "-" x 80 . "\n";
map {
print "url: " . $_->URL( ) . "\n";
my @addresses = get_host($_->URL( ));
if (scalar @addresses != 0) {
match_ip(get_host($_->URL( )));
} else {
print "address: unknown\n";
print "country: unknown\n";
print "code3: unknown\n";
print "code2: unknown\n";
} print "-" x 50 . "\n";
} @{$result->resultElements( )};
# Get the IPs for
# matching hostnames.
sub get_host {
my ($url) = @_;
# Chop the URL down to just the hostname.
my $name = substr($url, 7); $name =~ m/\//g;
$name = substr($name, 0, pos($name) - 1);
print "host: $name\n";
# And get the matching IPs.
my @addresses = gethostbyname($name);
if (scalar @addresses != 0) {
@addresses = map { inet_ntoa($_) } @addresses[4 .. $#addre
} else { return undef; }
return "@addresses";
}
# Check our IP in the
# Directi list in memory.
sub match_ip {
my (@addresses) = split(/ /, "@_");
foreach my $address (@addresses) {
print "address: $address\n";
my @classes = split(/\./, $address);
my $p; foreach my $class (@classes) {
$p .= pack("C", int($class));
} $p = unpack("N", $p);
my $counter = 0;
foreach (keys %ip_to) {
if ($p <= int($ip_to{$counter})) {
print "country: " . $country{$counter} . "\n";
print "code3: " . $code3{$counter} . "\n";
print "code2: " . $code2{$counter} . "\n";
last;
} else { ++$counter; }
}
Be s ure to replac e insert key here with your G oogle A P I key.
2.28.2. Running the Hack
H ere, we're querying to s ee how muc h worldly penetration
A mphetaD es k, a popular news aggregator, has , ac c ording to
G oogle's top s earc h res ults :
% perl geospider.pl -q "amphetadesk"
Trying to open ip-to-country.csv...
Database opened, loading...
Querying Google with amphetadesk...
Processing results from Google...
--------------------------------------------------------------
Search time: 0.081432s
Query: amphetadesk
Languages: en
Domain:
Start at: 0
Return items: 10
--------------------------------------------------------------
url: http://www.macupdate.com/info.php/id/9787
host: www.macupdate.com
host: www.macupdate.com
address: 64.5.48.152
country: UNITED STATES
code3: USA
code2: US
--------------------------------------------------
url: http://allmacintosh.forthnet.gr/preview/214706.html
host: allmacintosh.forthnet.gr
host: allmacintosh.forthnet.gr
address: 193.92.150.100
country: GREECE
code3: GRC
code2: GR
--------------------------------------------------
...etc...
2.28.3. Hacking the Hack
T his s c ript is only a s imple tool. You will make it better, no doubt.
T he firs t thing you c ould do is implement a more effic ient way to
query the I P - to- C ountry databas e. Storing data from ip-to-
country.cs v in a databas e would s peed s c ript s tartup time by
s everal s ec onds . A ls o, the ans wers to addres s - to- c ountry
queries c ould be obtained muc h fas ter.
You might as k if it wouldn't be eas ier to write a s pider that
does n't us e the G oogle A P I and ins tead downloads page after
page of res ults returned by G oogle at http://www.google.c om.
Yes , it is pos s ible, and it is als o the quic kes t way to get your
s c ript blac klis ted for the breac h of the G oogle's us er agreement.
G oogle is not only the bes t s earc h engine, it is als o one of the
bes t- monitored s ites on the I nternet.
Jacek Artymiak
Hack 47. Bring the Google Calculator to
the Command Line
Perf orm f eats of calculation on the command line, powered by
the magic of the Google calculator.
E veryone, whether they admit it or not, forgets how to us e the
U nix dc c ommand- line c alc ulator a few moments after they figure
it out for the nth time and s tumble through the c alc ulation at
hand. A nd, let's fac e it, the default des ktop (and I mean
computer des ktop) c alc ulator us ually does n't go beyond the
bas ic s : add, s ubtrac t, multiply, and dividey ou'll have s ome
grouping ability with c lever us e of M +, M - , and M R, if you're
luc ky.
What if you're interes ted in more than s imple math? I 've lived in
the U .S. for years now and s till don't know a yard from three feet
(I know now, thanks to the G oogle C alc ulator), let alone
c onverting ounc es to grams or s tone to kilograms .
T his two- line P H P s c ript by A dam Trac htenberg
(http://www.trac htenberg.c om) brings the G oogle c alc ulator to
your c ommand line s o that you don't have to s kip a beato r open
your brows erwhen you jus t need to c alc ulate s omething quic kly.
2.29.1. The Code
T he s c ript us es P H P (http://www.php.net), better known as a web
programming and templating language, on the c ommand line,
pas s ing your c alc ulation query to G oogle, s c raping the returned
res ults , and dropping the ans wer into your virtual lap.
T his hac k as s umes that you have P H P
ins talled on your c omputer and it lives in
the /us r/bin direc tory. I f P H P is
s omewhere els e on your s ys tem, you
s hould alter the path on the firs t line
ac c ordingly (e.g., #!/us r/local/bin/php4).
#!/usr/bin/php
<?php
preg_match_all('{<b>.+= (.+?)</b>}',
file_get_contents('http://www.google.com/search?q=' .
urlencode(join(' ', array_splice($argv, 1)))), $matches);
print str_replace('<font size=-2> </font>', ',',
"{$matches[1][0]}\n");
?>
Save the c ode to a file c alled calc in your path (I keep s uc h
things in a bin in my home direc tory) and make it available to run
by typing chmod +x calc.
2.29.2. Running the Hack
I nvoke your new c alc ulator on the c ommand line ["H ow to Run
the Sc ripts " in the P refac e] by typing calc (or ./calc if you're in
the s ame direc tory and don't feel like fiddling about with paths )
followed by any G oogle c alc ulator query that you might run
through the regular G oogle web s earc h interfac e.
H ere are a few examples :
% calc "21 * 2"
42
% calc 26 ounces + 1 pint in ounces
42 US fluid ounces
% calc pi
3.14159265
% calc answer to life, the universe and everything
42
I f your s hell gives you a pars e error or returns garbage, try
plac ing the c alc ulation ins ide quotation marks . For example, calc
21 * 2, without the double- quotes in the previous example,
returns $int($<b>calc.
T here's abs olutely no error c hec king in
this hac k, s o if you enter s omething that
G oogle does n't think is a c alc ulation,
you'll likely get garbage or nothing at all.
L ikewis e, remember that if G oogle c hanges
its H T M L output, the regular expres s ion
c ould fail; after all, as we'll point out
s everal times in this book, s c raping web
pages is a brittle affair. T hat s aid, if this
were made more robus t, it'd no longer be a
hac k, now would it?
Hack 48. Build a Custom Date Range
Search Form
Search only Google pages indexed today, yesterday, the last 7
days, or last 30 days.
G oogle has a date- bas ed s earc h [Hack #16] but us es J ulian
dates . M os t people c an't c onvert G regorian to J ulian in their
heads . But with a c onvers ion formula and a little P erl s c ripting,
you c an have a G oogle s earc h form that offers to let us ers
s earc h G oogle pages indexed today, yes terday, the las t 7 days ,
or the las t 3 0 days .
2.30.1. The Form
T he frontend to the s c ript is a s imple H T M L form:
<form action="http://path/to/cgi-bin/goofresh.cgi"
method="get">
Search for:<br />
<input type="text" name="query" size="30" />
<p />
Search for pages indexed how many days back?<br />
<select name="days_back">
<option value="0">Today</option>
<option value="1">Yesterday</option>
<option value="7">Last 7 Days</option>
<option value="30">Last 30 Days</option>
</select>
<p />
<input type="submit" value="Search">
</form>
T he form prompts for two us er inputs . T he firs t is a G oogle query,
c omplete with s upport for s pec ial s yntax ["Spec ial Syntax" in
C hapter 1 ] and s yntax mixing ["M ixing Syntaxes " in C hapter 1 ];
after all, we'll jus t be pas s ing your query along to G oogle its elf.
T he s ec ond input, a pull- down lis t, prompts for how many days '
worth of s earc h the form s hould perform.
2.30.2. The Code
N ote that this s c ript jus t does a c ouple of date trans lations in
P erl and redirec ts the brows er to G oogle, altered query in tow.
I t's jus t a regular query as far as G oogle is c onc erned, s o it
does n't require a developer's A P I key.
T his hac k requires an additional module,
Time::JulianDay, and won't run without it
(http://s earc h.c pan.org/s earc h?
query=T ime% 3 A % 3 A J ulianD ay).
#!/usr/local/bin/perl
# goofresh.cgi
# Searches for recently indexed files on Google.
# Usage: goofresh.cgi is called as a CGI with form input,
# redirecting the browser to Google, altered query in tow.
use CGI qw/:standard/;
use Time::JulianDay;
# Build a URL-escaped query.
(my $query = param('query')) =~ s#(\W)#sprintf("%%%02x", ord($1))#
# How many days back?
my $days_back = int param('days_back') || 0;
# What's the current Julian date?
my $julian_date = int local_julian_day(time);
# Redirect the browser to Google with query in tow.
print redirect(
'http://www.google.com/search?num=100' .
"&q=$query" .
"+daterange%3A" . ($julian_date - $days_back) . "-$julian_date
);
2.30.3. Running the Hack
P oint your brows er at the loc ation of the form you jus t c reated.
E nter a query, c hoos e how many days to go bac k, and c lic k the
Searc h button. You'll be s ent on to G oogle with the appropriate
daterange: res tric tion in tow.
2.30.4. Hacking the Hack
I f you don't like the date ranges hardc oded into the form, make
up your own and adjus t the form ac c ordingly:
<form action="http://path/to/cgi-bin /goofresh.cgi"
method="get">
Search for:<br />
<input type="text" name="query" size="30" />
<p />
Search for pages indexed how many days back?<br />
<select name="days_back">
<option value="0">Today</option>
<option value="30">Around 1 Month</option>
<option value="60">Around 2 Months</option>
<option value="90">Around 3 Months</option>
<option value="365">1 Year</option>
</select>
<p />
<input type="submit" value="Search">
</form>
O r s imply let the us er s pec ify how many days to go bac k in a
text field:
<form action="http://
path/to/cgi-bin
/goofresh.cgi"
method="get">
Search for:<br />
<input type="text" name="query" size="30" />
<p />
Search for pages indexed how many days back?<br />
<input type="text" name="days_back" size="4"
maxlength="4" />
<p />
<input type="submit" value="Search">
</form>
Hack 49. Search Yesterday's Index
Monitor a set of queries f or new f inds added to the Google index
yesterday.
[Hack #48] is a s imple web form- driven C G I s c ript for building
date range G oogle queries . A s imple web- bas ed interfac e is fine
when you want to s earc h for only one or two items at a time. But
what of performing multiple s earc hes over time, s aving the
res ults to your c omputer for c omparative analys is ?
A better fit for this tas k is a c lient- s ide applic ation that you run
from the c omfort of your own c omputer's des ktop. T his P erl
s c ript feeds s pec ified queries to G oogle via the G oogle Web A P I ,
limiting res ults to thos e indexed yes terday. N ew finds are
appended to a c omma- delimited text file per query, s uitable for
import into E xc el or your average databas e applic ation.
T his hac k requires an additional P erl
module, Time::JulianDay
(http://s earc h.c pan.org/author/M U I R/); it
jus t won't work until you have the module
ins talled.
2.31.1. The Queries
Firs t, you'll need to prepare a few queries to feed the s c ript. Try
thes e out via the G oogle s earc h interfac e its elf firs t to make
s ure you're rec eiving the kind of res ults you're expec ting. Your
queries c an be anything that you'd be interes ted in trac king over
time: topic s of long- las ting or c urrent interes t, s earc hes for new
direc tories of information [Hack #1] c oming online, unique
quotes from artic les , or other s ourc es that you want to monitor
for s igns of plagiaris m.
U s e whatever s pec ial s yntaxes you like exc ept for link:; as you
might remember, link: c an't be us ed in c onc ert with any other
s pec ial s yntax s uc h as daterange:, upon whic h this hac k relies . I f
you ins is t on trying anyway (e.g., link:www.yahoo.com
daterange:2452421-2452521), G oogle will s imply treat link as yet
another query word (e.g., link www.yahoo.com), yielding s ome
unexpec ted and us eles s res ults .
P ut eac h query on its own line. A s ample query file will look
s omething like this :
"digital archives"
intitle:"state library of"
intitle:directory intitle:resources
"now * * time for all good men * come * * aid * * party"
Save the text file s omewhere memorable; alongs ide the s c ript
you're about to write is as good a plac e as any.
2.31.2. The Code
Save the following c ode as goonow.pl. Be s ure to replac e insert
key here with your G oogle A P I key along the way.
#!/usr/local/bin/perl -w
# goonow.pl
# Feeds queries specified in a text file to Google, querying
# for recent additions to the Google index. The script appends
# to CSV files, one per query, creating them if they don't exist.
# usage: perl goonow.pl [query_filename]
# My Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use SOAP::Lite;
use Time::JulianDay;
$ARGV[0] or die "usage: perl goonow.pl [query_filename]\n";
my $julian_date = int local_julian_day(time) - 2;
my $google_search = SOAP::Lite->service("file:$google_wdsl");
open QUERIES, $ARGV[0] or die "Couldn't read $ARGV[0]: $!";
while (my $query = <QUERIES>) {
chomp $query;
warn "Searching Google for $query\n";
$query .= " daterange:$julian_date-$julian_date";
(my $outfile = $query) =~ s/\W/_/g;
open (OUT, ">> $outfile.csv")
or die "Couldn't open $outfile.csv: $!\n";
my $results = $google_search ->
doGoogleSearch(
$google_key, $query, 0, 10, "false", "", "false",
"", "latin1", "latin1"
);
foreach (@{$results->{'resultElements'}}) {
print OUT '"' . join('","', (
map {
s!\n!!g; # drop spurious newlines
s!<.+?>!!g; # drop all HTML tags
s!"!""!g; # double escape " marks
$_;
} @$_{'title','URL','snippet'}
) ) . "\"\n";
}
}
You'll notic e that G ooN ow c hec ks the day before yes terday's
rather than yes terday's additions (my $julian_date = int
local_julian_day(time) - 2;). G oogle indexes s ome pages very
frequently; thes e s how up in yes terday's additions and really
bulk up your s earc h res ults . So if you s earc h for yes terday's
res ults in addition to updated pages , you'll get a lot of nois e,
pages that G oogle indexes every day, rather than the fres h
c ontent that you're after. Skipping bac k one more day is a nic e
hac k to get around the nois e.
2.31.3. Running the Hack
T his s c ript is invoked on the c ommand line ["Running the
H ac ks " in P refac e] like s o:
$ perl goonow.pl query_filename
where query_filename is the name of the text file holding all the
queries to be fed to the s c ript. T he file c an be loc ated either in
the loc al direc tory or els ewhere; if the latter, be s ure to inc lude
the entire path (e.g., /mydocu~1/hacks /queries .txt).
Bear in mind that all output is direc ted to C SV files , one per
query, s o don't expec t any fas c inating output on the s c reen.
2.31.4. The Results
H ere's a quic k look at one of the C SV output files c reated,
intitle_s tate_library_of_.cs v:
"State Library of Louisiana","http://www.state.lib.la.us/"," ...
Click
here if you have any questions or comments. Copyright <C2><A9>
1998-2001 State Library of Louisiana Last modified: August 07,
2002. "
"STATE LIBRARY OF NEW SOUTH WALES, SYDNEY
AUSTRALIA","http://www.slnsw.gov.au/", " ... State Library of New
South
Wales Macquarie St, Sydney NSW Australia 2000 Phone: +61 2 9273
1414
Fax: +61 2 9273 1255. Your comments You could win a prize! ... "
"State Library of Victoria","http://www.slv.vic.gov.au/"," ...
clicking
on our logo. State Library of Victoria Logo with link to homepage
State
Library of Victoria. A world class cultural resource ... "
...
2.31.5. Hacking the Hack
T he s c ript keeps appending new finds to the appropriate C SV
output file. I f you wis h to res et the C SV files as s oc iated with
partic ular queries , s imply delete them, and the s c ript will c reate
them anew.
O r you c an make one s light adjus tment to have the s c ript c reate
the C SV files anew eac h time, overwriting the previous vers ion,
like s o:
...
(my $outfile = $query) =~ s/\W/_/g;
open (OUT, "> $outfile.csv")
or die "Couldn't open $outfile.csv: $!\n";
my $results = $google_search ->
doGoogleSearch(
$google_key, $query, 0, 10, "false", "", "false",
"", "latin1", "latin1"
);
...
N otic e the only c hange in the c ode is the removal of one of the >
c harac ters when the output file is c reatedi .e., open (OUT, " >
$outfile.csv") ins tead of open (OUT, ">> $outfile.csv").
Chapter 3. Images
H ac ks 5 0 - 5 3
Sec tion 3 .2 . G oogle I mages A dvanc ed Searc h I nterfac e
Sec tion 3 .3 . G oogle I mages Searc h Syntax
H ac k 5 0 . Borrow a C orporate or P roduc t L ogo
H ac k 5 1 . Brows e the World Wide P hoto A lbum
H ac k 5 2 . G oogle C artography: Street A rt in Your
N eighborhood
H ac k 5 3 . C apture the M ap
Hacks 50-53
Take a break from all that text and c hec k out G oogle I mages
(images .google.c om), an index of jus t under 9 0 0 million images
available on the Web. While s orely lac king in s pec ial s yntaxes ,
the A dvanc ed I mage Searc h
(http://images .google.c om/advanc ed_image_s earc h) does offer
s ome interes ting options .
O f c ours e, any options on the A dvanc ed
I mage Searc h page c an be expres s ed via
a little U RL hac king ["U nders tand G oogle
U RL s " in C hapter 1 ].
G oogle's I mage Searc h s tarts with a plain keyword s earc h.
I mages are indexed under a variety of keywords , s ome broader
than others ; be as s pec ific as pos s ible. I f you're s earc hing for
c ats , don't us e cat as a keyword unles s you don't mind getting
res ults that inc lude "c at s c an." U s e words that are more
uniquely c at related, s uc h as feline or kitten. N arrow down your
query as muc h as pos s ible, us ing as few words as pos s ible. A
query like feline fang, whic h would get you over 7 5 ,8 0 0 res ults
on G oogle, will get you only three res ults on G oogle I mage
Searc h; in this c as e, cat fang works better. (Building queries for
image s earc hing takes a lot of patienc e and experimentation.)
Searc h res ults inc lude a thumbnail, name, s ize (both pixels and
kilobytes ), and the U RL where the pic ture is to be found. C lic king
the pic ture will pres ent a framed page, G oogle's thumbnail of the
image at the top, and the page where the image originally
appeared at the bottom. Figure 3 - 1 s hows a typic al G oogle
I mages res ult after c hoos ing and c lic king one of the images
found by your s earc h.
Figure 3-1. A Google Images search result
Searc hing G oogle I mages c an be a real c raps hoot bec aus e it's
diffic ult to build multiple- word queries , and s ingle- word queries
lead to thous ands of res ults . You do have more options to narrow
your s earc h both through the A dvanc ed I mage Searc h interfac e
and through the G oogle I mage Searc h s pec ial s yntaxes .
3.2. Google Images Advanced Search
Interface
T he G oogle A dvanc ed I mage Searc h
(http://images .google.c om/advanc ed_image_s earc h) allows you
to s pec ify the s ize (expres s ed in pixels , not kilobytes ) of the
returned image. You c an als o s pec ify the kind of pic tures that
you want (G oogle I mages indexes only J P E G and G I F files ),
image c olor (blac k and white, grays c ale, or full c olor), and any
domain to whic h you wis h to res tric t your s earc h.
G oogle I mage s earc h als o us es three levels of filtering: none,
moderate, and s tric t. M oderate filters only explic it images , while
s tric t filters both images and text. While automatic filtering
does n't guarantee that you won't find any offens ive c ontent, it
will help. H owever, s ometimes filtering works agains t you. I f
you're s earc hing for images related to breas t c anc er, G oogle's
s tric t filtering will c ut down greatly on your potential number of
res ults . A ny time you're us ing a word that might be c ons idered
offens ivee ven in an innoc ent c ontexty ou'll have to c ons ider
turning off the filters or ris k mis s ing relevant res ults . O ne way to
get around the filters is to try alternate words . I f you're
s earc hing for breas t c anc er images , try s earc hing for
mammograms or Tamoxifen, a drug us ed to treat breas t c anc er.
3.3. Google Images Search Syntax
G oogle I mages offers a few s pec ial s yntaxes :
intitle:
Finds keywords in the page title. T his is an exc ellent
way to narrow down s earc h res ults .
intitle:paramecium
filetype:
Finds pic tures of a partic ular type: J P E G , G I F, or P N G .
N ote that s earc hing for filetype:jpg and filetype:jpeg
will get you different res ults bec aus e the filtering is
bas ed on file extens ion, not s ome deeper unders tanding
of the file type.
filetype:jpg paramecium
inurl:
A s with any regular G oogle s earc h, inurl: finds the
s earc h term in the U RL . T he res ults for this one c an be
c onfus ing. For example, you may s earc h for inurl:cat
and get the following U RL as part of the s earc h res ult:
www.example.com/something/somethingelse/something.html
H ey, where's the c at? Bec aus e G oogle indexes the
graphic name as part of the U RL , it's probably there. I f
the page above inc ludes a graphic named cat.j pg, that's
what G oogle is finding when you s earc h for inurl:cat. I t's
finding the c at in the name of the pic ture, not in the U RL
its elf.
inurl:cat
site:
A s with any other G oogle web s earc h, site: res tric ts
your res ults to a s pec ified hos t or domain. D on't us e this
to res tric t res ults to a c ertain hos t unles s you're really
s ure what's there. I ns tead, us e it to res tric t res ults to
c ertain domains . For example, s earc h for
football.site:uk and then s earc h for football.
site:com is a good example of how dramatic a differenc e
us ing site: c an make.
site:amazon.com shakespeare
Hack 50. Borrow a Corporate or Product
Logo
A dd a bit of spice to your presentation or school report by using
a corporate, project, product, or service logo.
You have a pres entation or propos al to make and want to add a
hint of your target audienc e's branding. O r perhaps you want to
s pic e up a s c hool report on a c ompany, produc t, or s ervic e. You
vis it their web s ite and find that every ins tanc e of their logo
would need s ome heavy editing to get rid of bac kground c lutter,
toolbar bits , and s o forth.
T here are a few ways that G oogle c an help you.
T his hac k began as a dis c us s ion at
http://hac ks .oreilly.c om/pub/h/2 2 7 and
owes a debt to the c omments on that page.
3.4.1. Google Images
P oint your brows er at G oogle I mages
(http://images .google.c om) and s earc h for the c ompany, projec t,
produc t, or s ervic e nameg rouped together in double quotes (") if
you think this needs to be explic ita nd a modifier s ignifying what
s ort of image you're after: logo, emblem, mascot, crest, " coat of
arms", etc . O n the whole, logo s eems to work bes t. H ere are s ome
examples :
"microsoft research" logo
"harvard university" crest
"apache software foundation" logo
G oogle I mages will us ually return a virtual gallery of logos . A nd,
c hanc es are, one of them is an unadulterated vers ion on a plain
white or blac k bac kground. Figure 3 - 2 s hows what turns up if you
s earc h for " microsoft research" logo.
Figure 3-2. Microsoft Research logos turned up
by Google Images
Save the imageu s ually a right mous e c lic k (Windows ) or C ontrol-
c lic k (M ac intos h) and Save P ic ture as ... or the likea nd drag it
right into your P owerP oint or Word doc ument. You c an then us e
the applic ation's bas ic built- in image editing tools to c rop,
rotate, or otherwis e frame the logo nic ely.
While mos t images produc ed for the Web wouldn't trans late well
if you were to print them out, they're us ually good enough for a
s lide pres entation, web page moc kup, or s c hool projec t.
For a better vers ion, you might try a G oogle Web s earc h for
" company/project/product/service name" logo filetype:tif. For a
logo you c an s c ale and otherwis e manipulate, try us ing
filetype:eps or filetype:pdf.
3.4.2. Annual Reports
P ublic c ompanies ' annual reports tend to be rather bland affairs
emblazoned with a c orporate logo and often provided online in
P D F form. P erform a G oogle Web s earc h for " company name"
"annual report" filetype:pdf. I f you c are about print quality, open
the P D F in A dobe I llus trator or the like, grab the logo, s c ale, and
add to your own doc ument as needed.
J us t remember that s c raping a low- res olution G I F or J P G off a
web s ite will work great for s c reen pres entations (Web,
P owerP oint, etc .) but will not work for print or s lides . When
inc orporating a low- res olution G I F or J P G into a printed piec e,
s lide, or overhead, the output will look jaggy. I ns tead, add EPS to
the s earc h terms and s earc h G oogle Web (not I mages ) to find
the c ompany's s tyle guide, whic h us ually inc ludes high-
res olution vec tor logos (.eps , .ai, etc .).
Hack 51. Browse the World Wide Photo
Album
Take a random stroll through the world's photo album using
some clever Google Image searches (and, optionally, a smidge of
programming know-how).
T he proliferation of digital c ameras and growing popularity of
c amera phones are turning the Web into a worldwide photo
album. I t's not only the holiday s naps of your A unt M innie or
minutiae of your moblogging friend's day that are available to
you. You c an ac tually take a s troll through the public ly
ac c es s ible albums of perfec t s trangers if you know where to
look. H appily, G oogle has c opies , and a c ouple of hac ks know
jus t where to look.
3.5.1. Random Personal Picture Finder
D igital photo files have relatively s tandard filenames (e.g.,
DSC01018.JPG) by default and are us ually uploaded to the Web
without being renamed. T he Random P ers onal P ic ture Finder
(http://www.diddly.c om/random) s ports a c lever little s nippet of
J avaSc ript c ode that s imply generates one of thes e filenames at
random and queries G oogle I mages for it.
T he res ult, s hown in Figure 3 - 3 , is s omething like looking
through the world's photo album: people eating, working, pos ing,
and s napping photos of their c ats , furniture, or toes . A nd s inc e
it's a normal G oogle I mages s earc h, you c an c lic k on any photo
to s ee the s tory behind it, and the other photos nearby.
N eat, huh?
Figure 3-3. The Random Personal Picture Finder
N ote that people s nap pic tures of not jus t
their toes (or the toes of others ). While an
informal s eries of Shift- Reloads in my
brows er turned up only a c ouple of
ques tionable bits of photographic work,
you s hould as s ume the res ults are not
workplac e- or c hild- s afe.
T he c ode behind the s c enes , as I mentioned, is really very
s implea s watc h of J avaSc ript (view the s ourc e of
http://www.diddly.c om/random/random.html in your brows er to
s ee the J avaSc ript bits for yours elf) and lis t of c amera types and
their res pec tive filename s truc tures
(http://www.diddly.c om/random/about.html). You're s imply
redirec ted to G oogle I mages with generated s earc h query in tow.
A s midge of P ython illus trates jus t how s imple it is to generate a
link to s ome random c ollec tion of photos s hot with a C anon
digital c amera:
$ python
Python 2.3 (#1, Sep 13 2003, 00:49:11)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more informat
>>> from random import randint
>>> linkform = 'http://images.google.com/images?q=IMG_%s.jpg'
>>> print linkform % str(randint(1, 9999)).zfill(4)
http://images.google.com/images?q=IMG_7931.jpg
You c an eas ily us e this as the bas is of a C G I s c ript that ac ts in
the s ame manner as the Random P ers onal P ic ture Finder does .
3.5.2. WebCollage
A nd if you think the Random P ers onal P ic ture Finder is fun, you'll
love J amie Zawins ki's WebC ollage
(http://www.jwz.org/webc ollage/), a P erl s c ript that finds random
pages , s trips them of their images , and puts thes e images
together to c reate a c ollage. T he c ollage is remixed and added
to onc e a minute, your brows er reloading a fres h vers ion every
s o often (or jus t hit Shift- Reload on your brows er yours elf to
s peed up the proc es s ). Figure 3 - 4 s hows a WebC ollage s naps hot
in time.
Figure 3-4. A typical WebCollage snapshot of
random pictures gleaned through Internet
search
C lic k on any one element and you'll be taken to the page that
the image is from.
You c an grab the P erl s ourc e
(http://www.jwz.org/webc ollage/webc ollage) and generate a page
s imilar to J amie's like s o:
$ webcollage -size '800x600' -imagemap
~/public_html/collage
where 800x600 is the s ize and ~/public_html/collage is the path to
and file you'd like to s ave the image map as .
WebC ollage is als o inc luded in Zawins ki's XSc reenSaver
(http://www.jwz.org/xs c reens aver), a s c reens aver s ys tem for
c omputers running the X Windows Sys tem (us ually s ome brand
of U nix).
A s J amie notes , he's dis abled the G oogle
s earc h in WebC ollage (you c an jus t
unc omment it in your own downloaded
vers ion) bec aus e it's not a G oogle A P I -
bas ed applic ation and s o is not in keeping
with G oogle's Terms of Servic e ["A N ote
on Spidering and Sc raping" in C hapter 9 ].
You c an, however, very eas ily write a
G oogle- s pec ific A P I - friendly vers ion.
Aaron Swartz
Hack 52. Google Cartography: Street Art
in Your Neighborhood
car tog ra phy n. The art or technique of making maps
or charts.
G oogle C artography (found here:
http://ric hard.jones .name/google- hac ks /google-
c artography/google- c artography.html) us es G oogle via the
G oogle Searc h A P I [C hapter 9 ] to build a vis ual repres entation
of the interc onnec tivity of s treets in an area.
T his applic ation takes a s tarting s treet and finds s treets that
inters ec t with it. Travers ing the s treets in a breadth- firs t manner,
the applic ation dis c overs more and more inters ec tions ,
eventually produc ing a graph that s hows the interc onnec tivity of
s treets flowing from the s tarting s treet.
Figures Figure 3 - 5 and Figure 3 - 6 s how maps generated for two
of the world's great c ities , N ew York and M elbourne,
res pec tively.
Figure 3-5. New York, U.S., as determined by
Google cartography
Figure 3-6. Melbourne, Australia, mapped by a
little Google cartography
I f you know the s treets in the areas s hown,
you will be able to find inc ons is tenc ies
introduc ed by the text pars ing proc es s ,
explained in more detail in the following
s ec tion.
3.6.1. The Gory Details
G oogle C artography us es the G oogle A P I to find web pages that
refers to s treet names . I nitial s treet and region c riteria are
c ombined to form a s earc h query, whic h is then exec uted by the
G oogle A P I . E ac h U RL from the G oogle res ults is fetc hed and
the c ontent of the pages c onverted into text. T he text is then
proc es s ed us ing pattern matc hing (programmers , read: regular
expres s ions ) des igned to c apture information relating to the
relations hip between s treets (for example, s treets that c ros s
eac h other or turn into other s treets ).
For eac h page a lis t of vertic es is produc ed, where eac h vertex
repres ents an inters ec tion between two s treets . A fter the res ults
for the initial s treet have been proc es s ed, the lis t of vertic es will
hopefully c ontain s ome vertic es that inters ec t with the initial
s treet.
A t this point, the mapper performs a breadth- firs t s earc h
through the s treets that c an be reac hed from the initial s treet.
E ac h s treet travers ed during this proc es s is s ubjec ted to the
s ame proc es s as the initial s treet, expanding the lis t of known
s treet inters ec tions . T his proc es s c ontinues until no more
s treets c an be reac hed from the initial s treet or until the halting
c riteria is s atis fied (for ins tanc e, reac hing the maximum amount
of G oogle A P I key us age).
O nc e the data c ollec tion phas e has c ompleted, the applic ation
c onverts the inters ec tion vertic es into a graph. E ac h s treet
bec omes a vertex of its own, with outgoing edges c onnec ting to
the vertic es of inters ec ting s treets . T he c onnec tivity of this
graph is analyzed (us ing the J ung graph pac kage at
http://jung.s ourc eforge.net) to determine the larges t maximal
s ubgraph where all pairs of vertic es are reac hable from eac h
other. T his s ubgraph almos t always c ontains the s tarting s treet;
the probability of this not oc c urring s hould dec reas e
proportionally to the number of s treets travers ed (whic h
naturally s elec ts for s treets that are in the s ame s ubgraph as
the s tarting s treet).
T he larges t c onnec ted s ubgraph is then vis ualized us ing a
Radial L ayout algorithm provided by the P refus e graph
vis ualization framework (http://prefus e.s ourc eforge.net). T he
graph is initially c entered on the s tart s treet but will
automatic ally adjus t its foc us to c enter around the mos t
rec ently s elec ted s treet.
I gnoring its general lac k of us efulnes s (at the time of this
writing), there are s everal problems with the applic ation worth
noting:
U s ing regular expres s ions ins tead of a c us tom pars er
means that pars ing mis takes are not unc ommon. T he
online vers ion of the applet has a voting option enabled
for partic ularly error- prone regular expres s ions , where
more than one page mus t agree on the analys is for it to
form part of the final graph. T his eliminates s ome
mis takes but at the c os t of many valid inters ec tions . I n
the future, the applet s hould make this behavior
optional.
D ue to the regular expres s ions us ed, the applic ation
works only with E nglis h text and E nglis h s treet- naming
c onventions . When examining the output, it is als o
obvious that regional variations of E nglis h make a
differenc e. A meric an E nglis h, Britis h E nglis h, and
A us tralian E nglis h often have s lightly different ways of
referring to s treet relations hips . I 've tried to allow for
s ome of the more c ommon variations .
G oogle's 1 0 - word limit ["T he 1 0 - Word L imit" in C hapter
1 ] means an inability to effec tively filter out previous ly
travers ed s treet names by pac king them into the query
prepended by a negative (-) s ign. Bec aus e the s treet
travers al algorithm deliberately tries s treets c los es t to
the initial s treet, the s earc h res ults returned on thos e
s treets often inc lude pages already proc es s ed. T hes e
duplic ate pages are filtered from further proc es s ing but
getting s earc h res ults bac k for already- proc es s ed pages
was tes prec ious G oogle A P I key us age juic e.
Some s treets are hard to dis ambiguate. H ighly generic
s treet names s uc h as M ain Street or H igh Street c an
pollute the c onnec tivity graph. So, for example, if
I nteres ting Road is found to inters ec t with H igh Street,
there may be s everal H igh Streets found in the res ults
other than the one c oming off I nteres ting Roadt hus
produc ing c onnec tivity via H igh Street, whic h is not
even indirec tly c onnec ted to I nteres ting Road and is
therefore highly uninteres ting.
H aving more s pec ific c ons traints in the s earc h c an help,
but that will reduc e the overall quantity of res ults .
A nother approac h would be to prune c onnec tions that
have low c onnec tivity with other parts of the graph. T his
will be the c as e with s treets that c ome off the "wrong"
H igh Street. A gain, this would inc reas e overall quality
but at the c os t of quantity, with inevitable fals e
pos itives .
T he pattern matc hing (us ing regular expres s ions ) for
s treet name and type is fairly limited. E xamples of
s treets the c urrent pattern matc hing will not c atc h are
Route N or H ighway N and s ingle name s treets s uc h as
Broadway.
3.6.2. Running the Hack
P oint your web brows er at http://ric hard.jones .name/google-
hac ks /google- c artography/applet/mapping.html.
You will need a rec ent vers ion of the J ava
plug- in (1 .3 .x is not new enough). You c an
download the J RE whic h inc ludes the java
plug- in from
http://java.s un.c om/j2 s e/1 .4 .2 /download.html.
You'll als o need your own G oogle A P I key
["U s ing Your G oogle A P I Key" in C hapter 9 ].
When the "E nter parameters " applet window appears (s hown in
Figure 3 - 7 ), enter your G oogle A P I key and adjus t the
maximums per ins truc tions on the G oogle C artography page.
N ow, type in your s tarting s treet and any additional s earc h
c riteria by whic h to narrow the s earc h.
Figure 3-7. Enter a starting point of interest and
anything that might narrow down the
possibilities
A fter a little c hurning, the applet will dis play a map for your
s treet and region of interes t s imilar to thos e in Figures Figure 3 -
6.
Be warned that the applet may us e large
amounts of bandwidth, depending on the
parameters you enter.
Richard Jones
Hack 53. Capture the Map
C apture the M ap (http://www.c apturethemap.de; Flas h required)
is altogether a little s illy, but it's a s pot of fun nevertheles s . A
s trategy game s imilar in flavor to the popular (at leas t it was
when I was growing up) board game of Ris k
(http://www.google.c om/s earc h? q=ris k+board+game), you
attempt near- global domination by taking turns plac ing pins into
a map of the world, c laiming more territory than your opponent.
Would that it were that eas y?
You don't s imply get to plac e pins into the map; that's where
G oogle c omes in. You and your opponent battle it out by
s upplying queries to be run agains t the G oogle index. T he firs t
nine res ults are loc alized ac c ording to where in the world the
s erver hos ting the res ulting page lives , eac h repres ented by a
pus hpin at a partic ular latitude and longitude on the map. For
example, if you s earc hed for news and one of the firs t nine
res ults were the BBC N ews home page, you'd find yours elf with a
pin in the U K-York, to be prec is e. A C N N hit would net you a pin
in Res ton, V irginia.
But s imply plac ing a pin is n't enough to maintain your hold on
any partic ular s pot. A well- plac ed query c an replac e your pin
with that of your opponent. P erhaps he s earc hed for C N N
direc tly and not only knoc ked your Res ton pin out of its s pot, but
als o c halked up three Res ton pins of their own. T he only way to
protec t your pins is by c ollec ting s everal in one s pot; for eac h
pin plac ed, you als o c laim one of the adjac ent s quares . A mas s a
three- by- three grid of s quares and you're s afe from any further
attac k.
T he game is over when either player is out of pins . T he player
with a higher s um of plac ed pins , c aptured s quares , and s aved
c aptured s quares (three- by- three grids ) is the winner.
Figure 3 - 8 s hows a game of C apture the M ap underway. Blue
(that's me) is in the mids t of plac ing pins , mos tly in Res ton, V A ,
thanks to a s earc h for C N N . N otic e the dragable magnifying
glas s over the U nited States , revealing overlaid details in the
form of larger s quares s howing the number of pins at eac h s pot.
A s c orec ard beneath eac h player's s earc h box s hows number of
pins plac ed, number of s quares c aptured, and number of s quares
c aptured and s aved.
Figure 3-8. Capture the Map turns Google into
Risk
C apture the M ap is s urpris ingly addic tive, as you try to noodle
whic h s earc h will return the maximum number of res ults from the
s ame geographic al loc ation. You'll likely s pend hours jus t trying
to take down your opponent's all- but- three- by- three grid in
M elbourne, A us tralia.
Chapter 4. News and Groups
H akc s 5 4 - 5 8
Sec tion 4 .2 . G oogle N ews
Sec tion 4 .3 . G oogle N ews Searc h Syntax
Sec tion 4 .4 . A dvanc ed N ews Searc h
Sec tion 4 .5 . M aking the M os t of G oogle N ews
Sec tion 4 .6 . Rec eive G oogle N ews A lerts
Sec tion 4 .7 . Beyond G oogle for N ews Searc h
H ac k 5 4 . Sc rape G oogle N ews
H ac k 5 5 . V is ualize G oogle N ews
Sec tion 4 .1 0 . G oogle G roups
Sec tion 4 .1 1 . 1 0 Sec onds of H ierarc hy Funk
Sec tion 4 .1 2 . Brows ing G roups
Sec tion 4 .1 3 . G oogle G roups Searc h Syntax
Sec tion 4 .1 4 . A dvanc ed G roups Searc h
H ac k 5 6 . G o D eeper into G roups with G oogle G roups 2
H ac k 5 7 . Sc rape G oogle G roups
H ac k 5 8 . Simplify G oogle G roups U RL s
Hakcs 54-58
T he I nternet is a worldwide c onvers ation, and nowhere is that
better reflec ted than in the flow of news c overage by "offic ial"
news s ourc es and bloggers alike, as well as the tangled
dis c us s ions of U s enet news and mailing lis ts . G oogle trawls
through our c onvers ations , threads them together, tidies them up
(jus t a tad), and reflec ts them bac k at us in G oogle N ews and
G oogle G roups .
4.2. Google News
A t the time of this writing, G oogle N ews
(http://news .google.c om) c ulls around 4 ,5 0 0 news s ourc es from
The Scots man to The China Daily, The New York Times to The
Minneapolis Star Tribune.
T he front page, s hown in Figure 4 - 1 , is updated algorithmic ally
without any involvement by puny humans a s ide, of c ours e, from
thos e writing the news in the firs t plac es everal times a day. T he
"mos t relevant news " ris es to the top.
Figure 4-1. The Google News front page
Stories are organized into c lus ters , drawing together c overage
and photographs from various news s ourc es around the Web.
C lic k the "all n related" link for a lis t of all s tories falling within
that c lus ter. C lic k "s ort by date" to s ee how the s tory unfolded
ac ros s s ourc es over time.
A ll of this does n't apply jus t to the front page, but to all of the
news paper- like s ec tions within: World, U .S., Bus ines s , Sc i/Tec h,
Sports , E ntertainment, and H ealth.
For a text- only and P D A /s martphone-
friendlier vers ion of G oogle N ews , c lic k the
Text Vers ion link in the left c olumn or point
your brows er at
http://news .google.c om/news ? ned=tus .
You may notic e that it takes a little longer
to load; this is bec aus e it c ombines all
s ec tions , from Top Stories to H ealth, into
one text- only page.
4.3. Google News Search Syntax
When you s earc h G oogle N ews , the default is to s earc h for your
query keywords anywhere in the news artic le's headline, s tory
text, s ourc e, or U RL .
iht will find s tories appearing in the
I nternational H erald Tribune
(http://www.iht.c om) even if "iht" appears
nowhere in the headline, s tory, or s ourc e's
proper name.
G oogle N ews Searc h us es bas ic boolean jus t like G oogle's Web
Searc h ["Bas ic Boolean" in C hapter 1 ].
G oogle N ews s upports the following s pec ial s earc h s yntax:
intitle:
Finds words in an artic le headline.
intitle:beckham
A n allintitle: variation finds s tories wherein all the words
s pec ified appear in an artic le headlinee ffec tively the s ame as
us ing intitle: before eac h keyword.
allintitle:miners strike benefits
intext:
Finds s earc h terms in the body of a s tory.
intext:"crude oil"
A n allintext: variation finds s tories where all your s earc h
keywords appear in artic le texte ffec tively the s ame as us ing
intext: before eac h keyword.
allintext:US stocks rebound
inurl:
L ooks for partic ular keywords in a news s tory's U RL :
ipod inurl:reuters
source:
Finds artic les from a partic ular s ourc e. U nfortunately,
G oogle N ews does not offer a lis t of its over 4 ,5 0 0
s ourc es , s o you'll have to gues s a little. A ls o, you need
to replac e any s pac es in the s ourc e's name with
unders c ore c harac ters ; e.g., The New York Times
bec omes new_york_times (c as e ins ens itive).
miners source:international_herald_tribune
"international space station" source:new_york_times
location:
Filters artic les from s ourc es loc ated in a partic ular
c ountry or s tate. For c ountry names c ons is ting of more
than one word, replac e any s pac es with unders c ore
c harac ters ; e.g., South A fric a bec omes south_africa
(c as e ins ens itive). I n the c as e of s tate names , us e
offic ial abbreviations like ca for C alifornia and id for
I daho.
"organic farming" location:france
election 2004 location:ca
4.4. Advanced News Search
G oogle A dvanc ed N ews Searc h, s hown in Figure 4 - 2 , is muc h
like the A dvanc ed Web Searc h. I t provides ac c es s to the G oogle
N ews s pec ial s yntax from the c omfort of a web form. You'll notic e
a s et of fields and pull- down menus as s oc iated with D ate; us e
thes e to s earc h for artic les publis hed in the las t hour, day, week,
month, or between any two days in partic ular.
Figure 4-2. The Google News Advanced Search
form
Fill in the fields , c lic k the Searc h button, and notic e how your
query is repres ented in the s earc h box on the res ults page.
4.5. Making the Most of Google News
T he bes t thing about G oogle N ews is its c lus tering c apability.
O n an ordinary news s earc h engine, a breaking news s tory c an
overwhelm s earc h res ults . For example, in late J uly 2 0 0 2 , a
s tory broke that hormone replac ement therapy might inc reas e
the ris k of c anc er. Suddenly, us ing a news s earc h engine to find
the phras e " breast cancer" was an exerc is e in futility, bec aus e
dozens of s tories around the s ame topic were c logging the
res ults page.
T hat does n't happen when you s earc h the G oogle N ews engine
bec aus e G oogle groups s imilar s tories by topic . You'd find a
large c lus ter of s tories about hormone replac ement therapy, but
they'd be in one plac e, leaving you to find other news about
breas t c anc er.
Some s earc hes c lus ter eas ily; they're s pec ialized or tend to
s pawn limited topic s . But other queries (s uc h as " George Bush")
s pawn lots of res ults and s everal different c lus ters . I f you need
to s earc h for a famous name or a general topic (s uc h as crime,
for example) narrow your s earc h res ults in one of the following
ways :
A dd a topic modifier that will s ignific antly narrow your
s earc h res ults , as in: " George Bush" environment crime
arson.
L imit your s earc h with one of the s pec ial s yntaxes , for
example: intitle:"George Bush".
L imit your s earc h to a partic ular s ourc e. Be warned that,
while this works well for a major breaking news s tory, you
might mis s loc al s tories . I f you're s earc hing for a major
A meric an s tory, C N N is a good c hoic e (source:cnn). I f
the s tory you're res earc hing is more international in
origin, the BBC works well (source:bbc_news).
4.6. Receive Google News Alerts
G oogle A lerts keep tabs on your G oogle N ews s earc hes [Hack
#59], notifying you if any news s tories appear matc hing your
s earc h c riteria. T hey're eas y to s et up, alter, and deletea nd free.
4.7. Beyond Google for News Search
A fter a long dry s pell, news s earc h engines have popped up all
over the I nternet. H ere are my top two:
Roc ketinfo (http://www.roc ketnews .c om)
D oes not us e the mos t extens ive s ourc es in the world,
but les s er- known pres s releas e outlets (s uc h as P E TA )
and very tec hnic al outlets (e.g., O nc oL ink, BioSpac e,
I ns uranc e N ews N et) are to be found here. Roc ketinfo's
main drawbac k is its limited s earc h and s ort options .
Yahoo! D aily N ews (http://dailynews .yahoo.c om)
Sports its s ourc e lis t right on the advanc ed s earc h page.
A 3 0 - day index means that s ometimes you c an find
things that have s lipped off the other engines . P rovides
free news alerts for regis tered Yahoo! us ers . O ne
drawbac k is that Yahoo! D aily N ews has few tec hnic al
s ourc es , whic h means s tories s ometimes appear over
and over in s earc h res ults .
Hack 54. Scrape Google News
Scrape Google News Search results to get at the latest f rom
thousands of aggregated news sources.
G oogle N ews , with its thous ands of news s ourc es worldwide, is a
veritable treas ure trove for any news hound. H owever, bec aus e
you c an't ac c es s G oogle N ews through the G oogle A P I [C hapter
9 ], you'll have to s c rape your res ults from the H T M L of a G oogle
N ews res ults page. T his hac k does jus t that, gathering up
res ults into a c omma- delimited file s uitable for loading into a
s preads heet or databas e. For eac h news s tory, it extrac ts the
title, U RL , s ourc e (i.e., news agenc y), public ation date or age of
the news item, and an exc erpted des c ription.
Bec aus e G oogle's Terms of Servic e prohibit the automated
ac c es s of their s earc h engines exc ept through the G oogle A P I ,
this hac k does not ac tually c onnec t to G oogle. I ns tead, it works
on a page of res ults that you've s aved from a G oogle N ews
Searc h that you've run yours elf. Simply s ave the res ults page as
H T M L s ourc e us ing your brows er's File Save A s ...
c ommand.
M ake s ure the res ults are lis ted by date ins tead of relevanc e.
When res ults are lis ted by relevanc e, s ome of the des c riptions
are mis s ing bec aus e s imilar s tories are c lumped together. You
c an s ort res ults by date by c hoos ing the "Sort by date" link on
the res ults page or by adding &scoring=d to the end of the res ults
U RL . A ls o, make s ure you're getting the maximum number of
res ults by adding &num=100 to the end of the res ults U RL . For
example, Figure 4 - 3 s hows res ults of a query for election 2004,
the lates t on the 2 0 0 4 U nited States P res idential
E lec tions omething of great import at the time of this writing.
Figure 4-3. Google News results for election
2004, sorted by date
Bear in mind that s c raping web pages is a
brittle oc c upation. A s ingle c hange in the
H T M L c ode underlying the s tandard
G oogle N ews res ults page and you're more
than likely to end up with no res ults .
A t the time of this writing, a typic al G oogle N ews Searc h res ult
looks a little s omething like this :
<a href="http://www.townhall.com/columnists/phyllisschlafly/ps2004
Show Me State debate spotlights conservative issues</a><br><font s
<font color=#6f6f6f>Town Hall, DC -</font> <nobr>8
minutes ago</nobr></font><br><font size=-1><b>...</b>
The <b>2004</b> <b>election</b> presents a stark choice to voters
on many issues. The second debate highlighted those differences on
<b>...</b> </font><br>
While for mos t of you this is utter gobbledygook, it is probably of
s ome us e to thos e trying to unders tand the regular expres s ion
matc hing in the following s c ript.
4.8.1. The Code
Save this c ode to a file c alled news 2cs v.pl:
#!/usr/bin/perl
# news2csv.pl
# Google News Results exported to CSV suitable for import into Exc
# Usage: perl news2csv.pl < news.html > news.csv
print qq{"title","link","source","date or age", "description"\n};
my %unescape = ('<'=>'<', '>'=>'>', '&'=>'&',
'"'=>'"', ' '=>' ');
my $unescape_re = join '|' => keys %unescape;
my $results = join '', <>;
$results =~ s/($unescape_re)/$unescape{$1}/migs; # unescape HTML
$results =~ s![\n\r]! !migs; # drop spurious newlines
while ( $results =~ m!<a href="([^"]+)" ?>(.+?)</a>
<br>(.+?)<nobr>(.+?)</nobr>.*?<br>.+?)<br>migs {
my($url, $title, $source, $date_age, $description) =
($1||'',$2||'',$3||'',$4||'', $5||'');
$title =~ s!"!""!g; # double escape " marks
$description =~ s!"!""!g;
my $output =
qq{"$title","$url","$source","$date_age","$description"\n};
$output =~ s!<.+?>!!g; # drop all HTML tags
print $output;
4.8.2. Running the Script
Run the s c ript from the c ommand line ["Running the H ac ks " in
the the P refac e], s pec ifying the G oogle N ews res ults H T M L
filename and the name of the C SV file that you wis h to c reate or
to whic h you wis h to append additional res ults . For example,
us ing news .html as our input and news .cs v as our output:
$ perl news2csv.pl <
news.html
>
news.csv
L eaving off the > and C SV filename s ends the res ults to the
s c reen for your perus al.
4.8.3. The Results
T he following are s ome of the 2 0 ,3 0 0 res ults returned by a
G oogle N ews Searc h for election 2004 and us ing the H T M L page
of res ults s hown in Figure 4 - 3 :
$ perl news2csv.pl < news.html
"title","link","source","date or age", "description"
"Bush and Kerry on the razor's edge","http://www.dallasnews.com/sh
opinion/columnists/mdavis/stories/101304dnedidavis.97af9.html","Da
(subscription),<A0>TX<A0>- ","12 minutes ago","... But just as the
is a referendum on Mr. Bush, debate analysis depends on ... Tonigh
whether the Bush debate grade for 2004 is generally ... "
"MINNESOTA: Notes and quotes from battleground Minnesota in ...",
"http://www.twincities.com
/mld/twincities/news/state/minnesota/9902651.htm","Pioneer Press (
MN<A0>- ","16 minutes ago","... the state. Both sides had assumed
easily go Democratic, as it had for every presidential election si
"MINNESOTA: Notes and quotes from battleground Minnesota in ...","
duluthsuperior.com/mld/duluthsuperior/news/politics/9902651.htm","
A0>MN<A0>- ","19 minutes ago","... the state. Both sides had assum
would easily go Democratic, as it had for every presidential elect
...
E ac h lis ting ac tually oc c urs on its own
line.
4.8.4. Hacking the Hack
You'll want to leave mos t of the news 2cs v.pl s c ript alone, s inc e
it's been built to make s ens e out of the G oogle N ews formatting.
I f you don't like the way the program organizes the information
that's taken out of the res ults page, you c an c hange it. J us t
rearrange the variables on the following line, s orting them in any
way that you c hoos e. Be s ure to keep a c omma between eac h
one.
my $output =
qq{"$title","$url","$source","$date_age","$description"\n};
For example, perhaps you want only the U RL and title. T he line
s hould read:
my $output =
qq{"$url","$title"\n};
T hat \n s pec ifies a new line, and the $ c harac ters s pec ify that
$url and $title are variable names ; keep them intac t.
O f c ours e, now your output won't matc h the header at the top of
the C SV file, by default:
print qq{"title","link","source","date or age", "description"\n};
A s before, s imply c hange this to matc h, as follows :
print qq{"url","title"\n};
Hack 55. Visualize Google News
Watch stories aggregated by Google News unf old over time,
coverage broaden and f ade, and hotspots emerge and f ade again
into the background.
N ews map (http://www.marumus hi.c om/apps /news map) is a
whizbang Flas h- bas ed treemap repres entation
(http://www.c s .umd.edu/hc il/treemap/index.s html) of the s tories
flowing through G oogle N ews . T he N ews map home page
des c ribes it bes t:
Treemaps are traditionally s pac e- c ons trained
vis ualizations of information. N ews map's objec tive takes
that goal a s tep further and provides a tool to divide
information into quic kly rec ognizable bands whic h, when
pres ented together, reveal underlying patterns in news
reporting ac ros s c ultures and within news s egments in
c ons tant c hange around the globe.
P oint your web brows er at the N ews map page and c lic k the
L A U N C H button to begin. Figure 4 - 4 s hows N ews map in ac tion.
Figure 4-4. Newsmap's Standard banded layout,
focusing on U.K. coverage of world, nation, and
business news
E ac h c olor- c oded band (you'll have to take our word for its being
in c olor) repres ents a G oogle N ews s ec tion: from top to bottom
are World, N ation, Bus ines s , Tec hnology, Sports , E ntertainment,
and H ealth. N otic e that I 've only s elec ted the firs t three by
c hec king their as s oc iated c hec kboxes at the bottom- right. A ls o
notic e that I 've s elec ted news only from the U .K. in the
C ountries tab ac ros s the top.
T he c olors appear in a gradient from brightes t ("les s than 1 0
minutes ago") to darkes t ("more than 1 hour ago") s uc h that the
lates t s tories s tand right out.
T he more s ubs tantial the band and bigger the enc los ed headline,
the greater the number of related s tories . You c an eas ily s pot
the fres hes t and mos t c overed s tories : they're the big, bright
bloc ks .
H over your mous e over any s tory for a brief des c ription drawn
from the primary s ourc et he s tory around whic h others are
c lus tereda s c hos en by G oogle N ews .
T here's als o a Squarified vers ion (Figure 4 - 5 ) that I prefer: more
s o than with the Standard vers ion (Figure 4 - 4 ), you are able to
s ee the s pread of c overage ac ros s all news c ategories . Switc h
between the two layouts by c lic king the appropriate L ayout
button near the bottom- right.
Figure 4-5. Newsmap's Squarified layout,
drawing from U.S. coverage of news across all
Google News categories
N ews map provides a fas c inating bird's - eye view of news as it
unfolds on the Web. H ere are a c ouple of my favorite N ews map
s ettings :
Selec t only one news c ategoryWorld works bes ta nd draw
c overage in from two or three c ountries . Set the layout
to Squarified. N ow take a gander at the headlines and
notic e how they differ in title and c overage by c ountry.
Selec t only one news c ategory and one c ountry from
whic h to draw s ourc es . Set the layout to Standard. N ow,
meander bac k through the arc hive (bottom- left) day- by-
day or hour- by- hour and watc h how the s tories unfold
over timeb ands widen and narrow, hots pots appear and
dis appear, and the headline c hanges right along with the
primary s ourc e.
4.10. Google Groups
U s enet groups , text- bas ed dis c us s ion groups c overing literally
hundreds of thous ands of topic s , have been around s inc e long
before the World Wide Web. D eja N ews us ed to be the repos itory
of U s enet information until it s old off its arc hive to G oogle in
early 2 0 0 1 . G oogle filled it out s till further and relaunc hed it as
G oogle G roups (http://groups .google.c om). I ts s earc h interfac e,
s hown in Figure 4 - 6 , is rather different from the G oogle Web
Searc h, as all mes s ages are divided into groups , and the groups
thems elves are divided into topic s c alled hierarc hies .
Figure 4-6. The Google Groups home page
T he G oogle G roups arc hive begins in 1 9 8 1 and c overs up to the
pres ent day. J us t s hy of 8 5 0 million mes s ages are arc hived. A s
you might imagine, that's a pretty big arc hive, c overing literally
dec ades of dis c us s ion. Stuc k in an anc ient c omputer game?
N eed help with that s ewing mac hine you bought in 1 9 8 2 ? You
might be able to find the ans wers here.
G oogle G roups als o allows you to partic ipate in U s enet
dis c us s ions , handy bec aus e not all I SP s provide ac c es s to
U s enet thes e days (and even thos e that do tend to limit the
number of news groups they c arry). See the G oogle G roups
pos ting FA Q
(http://groups .google.c om/googlegroups /pos ting_faq.html) for
ins truc tions on how to pos t to a news group. You'll have to s tart
by loc ating the group to whic h you want to pos t and that means
us ing the hierarc hy.
4.11. 10 Seconds of Hierarchy Funk
T here are regional and s maller hierarc hies , but the main ones
are alt, biz, comp, humanities , mis c, news , rec, s ci, s oc, and talk.
M os t web groups are c reated through a voting proc es s and are
put under the hierarc hy that's mos t applic able to the topic .
4.12. Browsing Groups
From the main G oogle G roups page, you c an brows e through the
lis t of groups by pic king a hierarc hy from the front page. You'll
s ee that there are s ubtopic s , s ub- s ubtopic s , s ub- s ub- s ubtopic s ,
andwell, you get the pic ture. For example, in the comp
(c omputers ) hierarc hy, you'll find the s ubtopic comp.s ys or
c omputer s ys tems . Beneath that lie 7 5 groups and s ubtopic s ,
inc luding comp.s ys .mac, a branc h of the hierarc hy devoted to the
M ac intos h c omputer s ys tem. T here are 2 4 M ac s ubtopic s , one
of whic h is comp.s ys .mac.hardware, whic h has , in turn, three
groups beneath it. O nc e you've drilled down to the mos t s pec ific
group applic able to your interes ts , G oogle G roups pres ents the
pos tings thems elves , s orted in revers e c hronologic al order.
T his s trategy works fine when you want to read a s low (i.e.,
c ontaining little traffic ) or moderated group, but when you want
to read a bus y, free- for- all group, you may wis h to us e the
G oogle G roups Searc h engine. T he s earc h on the main page
works muc h like the regular G oogle s earc h; the differenc es are
the G oogle G roups tab and the as s oc iated group and pos ting
date that ac c ompanies eac h res ult.
T he A dvanc ed G roups Searc h
(http://groups .google.c om/advanc ed_group_s earc h), however,
looks muc h different. You c an res tric t your s earc hes to a c ertain
news group or news group topic . For example, you c an res tric t
your s earc h as broadly as the entire comp hierarc hy (comp* would
do it) or as narrowly as a s ingle group s uc h as
comp.robotics .mis c. You c an res tric t mes s ages to s ubjec t and
author, or res tric t them by mes s age I D .
O f c ours e, any options on the A dvanc ed
G roups Searc h page c an be expres s ed via
a little U RL hac king ["U nders tanding
G oogle U RL s " in C hapter 1 ].
P os s ibly the bigges t differenc e between G oogle G roups and
G oogle Web Searc h is the date s earc hing. With G oogle Web
Searc h, date s earc hing is notorious ly inexac t; date refers to
when a page was added to the index rather than when the page
was c reated. E ac h G oogle G roups mes s age is s tamped with the
day that it was ac tually pos ted to the news group. T hus , the date
s earc hes on G oogle G roups are ac c urate and indic ative of when
c ontent was produc ed. A nd, thankfully, they us e the more
familiar G regorian dates rather than the G oogle Web Searc h's
J ulian dates [Hack #16] .
4.13. Google Groups Search Syntax
By default, G oogle G roups looks for your query keywords
anywhere in the pos ting s ubjec t or body, group name, or author
name. G roups s earc h us es the s ame s ort of bas ic Boolean as
G oogle Web Searc h ["Bas ic Boolean" in C hapter 1 ].
G oogle G roups is an arc hive of
c onvers ations . T hus , when you're
s earc hing, you'll be more s uc c es s ful if you
try looking for c onvers ational and informal
language, not the c arefully s truc tured
language you'll find on I nternet s ites well,
s ome I nternet s ites , anyway.
A nd, thanks to s ome s pec ial s yntax, you c an do s ome prec is e
s earc hing if you know the magic inc antations :
insubject:
Searc hes pos ting s ubjec ts for query words .
insubject:rocketry
group:
Res tric ts your s earc h to a c ertain group or s et of groups
(topic ). T he * (as teris k) wildc ard modifies a group:
s yntax to inc lude everything beneath the s pec ified group
or topic . rec.humor* or rec.humor.* (effec tively the s ame)
will find res ults in the group rec.humor, as well as
rec.humor.funny, rec.humor.j ewis h, and s o forth.
group:rec.humor*
group:alt*
group:comp.lang.perl.misc
author:
Spec ifies the author of a news group pos t. T his c an be a
full or partial name, even an email addres s .
author:fred
author:"fred flintstone"
author:[email protected]
4.13.1. Mixing syntaxes in Google Groups
G oogle G roups is muc h more friendly to s yntax mixing ["M ixing
Syntax" in C hapter 1 ] than G oogle Web Searc h. You c an mix any
two or more s yntaxes together in a G oogle G roups Searc h, as
exemplified by the following typic al s earc hes :
intitle:literature group:humanities* author:john
intitle:hardware group:comp.sys.ibm* pda
4.13.1.1 Some common search scenarios
T here are s everal ways you c an mine G oogle G roups for
res earc h information. Remember, though, to view any information
that you get here with a c ertain amount of s keptic is m. U s enet is
jus t hundreds of thous ands of people tos s ing around links ; in
that res pec t, it's jus t like the Web.
4.13.1.2 Tech support
E ver us ed Windows and dis c overed that there's s ome program
running that you've never heard of? U nc omfortable, is n't it? I f
you're wondering if H I D SE RV is s omething nefarious , G oogle
G roups c an tell you. J us t s earc h G oogle G roups for HIDSERV.
You'll find that plenty of people had the s ame ques tion before
you did, and it's been ans wered.
I find that G oogle G roups is s ometimes more us eful than
manufac turers ' web s ites . For example, I was trying to ins tall a
s et of flight devic es for a frienda joys tic k, throttle, and rudder
pedals . T he web s ite for the manufac turer c ouldn't help me figure
out why they weren't working. I des c ribed the problem as bes t I
c ould in a G oogle G roups s earc hu s ing the name of the parts and
the manufac turer's brand namea nd although it was n't eas y, I was
able to find an ans wer.
Sometimes your problem is n't as s erious but it's jus t as
annoying; you might be s tuc k in a c omputer game. I f the game
has been out for more than a few months , your ans wer is
probably in G oogle G roups . I f you want the ans wer to an entire
game, try the magic word walkthrough. So if you're looking for a
walkthrough for Q uake I I , try the s earc h " quake ii" walkthrough.
(You don't need to res tric t your s earc h to news groups ;
walkthrough is a word s trongly as s oc iated with gamers .)
4.13.1.3 Finding commentary immediately
after an event
With G oogle G roups , date s earc hing is very prec is e (unlike date
s earc hing G oogle's Web index), s o it's an exc ellent way to get
c ommentary during or immediately after events .
Barbra Streis and and J ames Brolin were married on J uly 1 ,
1 9 9 8 . Searc hing for " Barbra Streisand" "James Brolin" between
J une 3 0 , 1 9 9 8 and J uly 3 , 1 9 9 8 leads to over 4 8 res ults ,
inc luding reprinted wire artic les , links to news s tories , and
c ommentary from fans . Searc hing for " barbra streisand" " james
brolin" without a date s pec ific ation finds more than 1 ,8 0 0
res ults .
U s enet is als o muc h older than the Web and is ideal for finding
information about an event that oc c urred before the Web. C oc a-
C ola releas ed N ew C oke in A pril 1 9 8 5 . You c an find information
about the releas e on the Web, of c ours e, but finding
c ontemporary c ommentary would be more diffic ult. A fter s ome
playing around with the dates (jus t bec aus e it's been releas ed
does n't mean it's in every s tore) I found plenty of c ommentary
about N ew C oke in G oogle G roups by s earc hing for the phras e
" new coke" during the month of M ay 1 9 8 5 . I nformation inc luded
poll res ults , tas te tes ts , and s pec ulation on the new formula.
Searc hing later in the s ummer yields information on C oke re-
releas ing old C oke under the name "C oc a- C ola C las s ic ."
4.14. Advanced Groups Search
T he A dvanc ed G roups Searc h, s hown in Figure 4 - 7 , is muc h like
the A dvanc ed Web Searc h and A dvanc ed N ews Searc h.
Figure 4-7. The Advanced Groups Search form
Rather than fiddling about with the s pec ial s yntax detailed
earlier, s imply fill out the form, hit the Searc h button, and let
G oogle G roups c ompos e the query for you. You c an res tric t your
s earc h to a s pec ific news group or s ec tion of hierarc hy (e.g.,
comp.os.*), partic ular pers on, a language, or pos ts arriving in the
pas t 2 4 hours , week, month, three months , s ix months , or year.
You c an even s earc h for a partic ular mes s age if you know the
mes s age I D . A nd s inc e U s enet c an be jus t as wooly as the Web,
you might dec ide to turn on SafeSearc h.
Hack 56. Go Deeper into Groups with
Google Groups 2
Google Groups 2 merges Usenet news with mailing lists.
G oogle G roups is a great way to res earc h tec h problems or get
help with game walkthroughs and other topic s of geek
dis c us s ion. But the firs t vers ion of G oogle G roups does n't for the
mos t part index mailing lis t arc hives ; the 8 0 0 - pound gorilla in
that s pac e is Yahoo! G roups , whic h s pec ializes in mailing lis ts
(tens of thous ands of them! ). G oogle might jus t be a c hallenger
yet, however, with the releas e of G oogle G roups 2 .
G oogle G roups 2 (http://groups - beta.google.c om) is in beta, as
you might s us pec t given the U RL a t leas t it is at the time of this
writing. T hat's why we're c overing it s eparately rather than
s imply doing away with the previous bit on G oogle G roups .
A s you c an s ee in Figure 4 - 8 , G oogle G roups 2 s tarts out with a
s ubjec t index that looks a bit like Yahoo! G roups (or Yahoo! , for
that matter). Topic s inc lude A rts and E ntertainment, Bus ines s
and Financ e, and C omputers . A s you s tart brows ing and
exploring G oogle G roups 2 , you'll notic e that it c ons is ts of a
c ombination of U s enet news groups (in s tandard U s enet
hierarc hy: alt, comp, news , etc .).
Figure 4-8. The Google Groups 2 home page
T here are a c ouple of mailing lis ts in
G oogle G roups 2 made to look like U s enet
news groups : the obvious ly unoffic ial
google.public.bork.bork.bork is an example.
J us t bec aus e it looks like a U s enet
news group, don't as s ume that it is .
E ac h news group is ac c ompanied by a brief des c ription s o that
you don't feel like you're fumbling about in the dark. E mail- bas ed
groups s tand out ever s o s lightly: while they are als o
ac c ompanied by a brief des c ription, they are not lis ted by
hierarc hy and indic ate how many people are s ubs c ribed to the
lis t.
A t this point, G oogle G roups 2 is a new
offering, s o you won't find many mailing
lis ts that have large numbers of members .
E xpec t that to c hange over time.
C lic k on a group and you'll s ee a lis t of the lates t topic s with
exc erpts on the left and older, though s till ac tive, topic titles on
the right. C lic k the "read more >>" link as s oc iated with a topic
to read individual pos tings .
For longer dis c us s ions , you may want to s ee the mes s ages in a
partic ular topic laid out in threads s o that you c an more eas ily
follow who was res ponding to whom. C lic k the "view as tree" link
jus t beneath the topic title. T he window s plits into two frames :
on the right is the s ame s c reen you jus t s aw, while on the left is
a lis t of all the mes s ages in the dis c us s ion, arranged in a tree or
threaded s tyle. C lic k any mes s age in the tree to read its
c ontents in the frame on the right.
4.15.1. Monitor Group Activity
A group's "A bout:" des c ription provides jus t enough information
for you to dec ide whether or not to read on, but there's a page of
information for eac h group that you jus t don't want to mis s . C lic k
the "more about this group" link next to the group's des c ription
and you'll s ee a bird's - eye view of the group (Figure 4 - 9 ):
Where in the U s enet hierarc hy it fits (if it's a news group)
A c tivity level
N umber of members (s ubs c ribers or gues t members ,
pres umably bas ed on pos tings )
M es s ages per month, with month- by- month details
Top pos ters this month and s inc e the group began (or
was pic ked up by G oogle)
Figure 4-9. A bird's-eye view of a Google Group
N o matter how appealing the topic , a mailing lis t or news group is
going to have a minimum amount of us efulnes s if there's only a
mes s age or s o pos ted eac h month. C onvers ely, a mailing lis t
that gets 5 0 0 mes s ages a month may be too bus y for you to
rec eive individual pos tings by email. You may well dec ide to join
the lis t, but read it through the G oogle G roups s ite.
T here's als o the option of s ubs c ribing to a public group's
mes s ages and topic s via A tom
(http://help.blogger.c om/bin/ans wer.py? ans wer=6 9 7 &topic =3 6 ),
an XM L- bas ed s yndic ation format. T hat way, you c an keep tabs
on a s et of groups alongs ide all thos e RSS feeds you're reading.
Find an RSS/A tom feed reader to s uit your needs and p
at
http://www.atomenabled.org/everyone/atomenabled/in
c =5 .
4.15.2. Search Group Messages
C onfus ingly enough, the G oogle G roups 2 home page has two
s earc h boxes , both labeled Searc h G roups . A t the time of this
writing, they both return the s ame res ults : a s et of groups
matc hing your s earc h c riteria, followed by individual mes s age
res ults , as s hown in Figure 4 - 1 0 ).
Figure 4-10. Search results show groups first,
then messages
G oogle G roups 2 has an A dvanc ed Searc h at http://groups -
beta.google.c om/advanc ed_s earc h. You'll notic e that the
advanc ed s earc h here looks muc h like the advanc ed s earc h for
the older G oogle G roups ; however, you c annot do the date
s earc hing that you c an do with the G oogle G roups A dvanc ed
Searc h. You als o c annot, at leas t not at the time of writing,
s earc h by language. Bear in mind that G oogle G roups 2 is s till in
beta; I expec t that there will eventually be at leas t as muc h
func tionality for it as there is for the regular G oogle G roups . I n
the meantime, you c an take advantage of the exis ting s earc h
form, and try out the s ame G oogle G roups Searc h s yntax in
G oogle G roups 2 .
4.15.3. Sign In
Sign in (or s ign up for a free G oogle ac c ount if you've not already
done s o) and you'll be able to s ubs c ribe to groups and c reate
your own. You'll find a "Sign in" link at the top- right of the G oogle
G roups 2 home page.
To s ubs c ribe to a group, s imply c lic k the "Subs c ribe to this
group" link as s oc iated with any group. A s a s ubs c riber, you c an
rec eive no email at all and read mes s ages on the Web, a
s ummary of the day's ac tivities , or a s ingle daily diges t of all
mes s ages . T here is c urrently no option for rec eiving individual
mes s ages as they are pos ted to your group.
G oogle G roups 2 and G mail [C hapter 6 ]
are a perfec t c ombination, allowing you to
read mailing lis ts and news group
mes s ages in an email environment without
c luttering up your home or work inbox.
C reate a new group of your very own by c lic king the "C reate a
new group" link in the s idebar on the left s ide of the G oogle
G roups 2 home page. You'll be guided through a two- s tep
proc es s :
1. Set up your group. G ive it a name (e.g., Google Hacks),
email addres s (e.g.,
[email protected]), and
des c ription. D ec ide whether it s hould be public (anyone
c an join, members c an pos t, and the arc hive is public ),
announc ement only (the arc hive is public , but only
moderators c an pos t), or res tric ted (members hip is
invite- only and nothing appears in the direc tory or
s hows up in s earc h res ults ).
2. A dd members to your group, and G oogle G roups 2 will
invite them by email.
4.15.4. This Must Be the Future
I t's not c lear when G oogle is going to make this vers ion of
G oogle G roups the dominant vers ion; it c ould very well be s o by
the time you read this . P eople are us ed to the old vers ion, and
Vers ion 2 s till has s ome us er interfac e is s ues to work through.
You're s ure to s ee more and bigger mailing lis ts appearing in the
c oming months . E mail is s till the killer app of the I nternet, and
there's always room for another dis c us s ion.
Hack 57. Scrape Google Groups
Pull results f rom Google Groups searches a comma-delimited
f ile.
I t's eas y to look at the I nternet and s ay that it's a group of web
pages or c omputers or networks . But look a little deeper and
you'll s ee that the c ore of the I nternet is dis c us s ions : mailing
lis ts , online forums , and even web s ites , where people hold forth
in glorious H T M L , waiting for people to drop by, c ons ider their
philos ophies , make c ontac t, or buy their produc ts and s ervic es .
N owhere is the I nternet- as - c onvers ation idea more prevalent
than in U s enet news groups . G oogle G roups has an arc hive of
over 8 0 0 million mes s ages from years of U s enet traffic . I f you're
res earc hing a partic ular time, s earc hing and s aving G oogle
G roups mes s age pointers c omes in really handy.
Bec aus e G oogle G roups is not s earc hable by the c urrent vers ion
of the G oogle A P I , you c an't build an automated G oogle G roups
query tool without violating G oogle's Terms of Servic e. H owever,
you c an s c rape the H T M L of a page you vis it pers onally and
s ave to your hard drive.
T he firs t thing that you need to do is run a G oogle G roups
Searc h. See the "G oogle G roups " s ec tion earlier in this c hapter
for s ome hints on the bes t prac tic es for s earc hing this mas s ive
mes s age arc hive.
T his hac k works with G oogle G roups , not
G oogle G roups 2 . While any s ort of
s c raping is brittle, we expec t Vers ion 2 to
c hange form many times in the very near
future and wanted to be s ure you had the
bes t c hanc e of s uc c es s with this hac k.
I t's bes t to put pages that you're going to s c rape in order of
date; that way if you're going to s c rape more pages later, it's
eas y to look at them and c hec k the las t date that the s earc h
res ults c hanged. L et's s ay that you're trying to keep up with
us es of P erl in programming the G oogle A P I ; your query might
look like this :
perl group:google.public.web-apis
O n the right s ide of the res ults page is an option to s ort either
by relevanc e or date; c lic k the "Sort by date" link. Your res ults
page s hould look s omething like Figure 4 - 1 1 .
Figure 4-11. The results of a Google Groups
Search, sorted by date
Save this page to your hard drive, naming it s omething
memorable, like groups .html.
Sc raping is brittle at bes t. A s ingle c hange
in the H T M L c ode underlying G oogle
G roups pages and the s c ript won't get very
far.
A t the time of this writing, a typic al G oogle G roups Searc h res ult
looks like this :
<a href=/https/www.scribd.com/groups?q=perl+group:google.public.web-apis&hl=en&lr=&c2co
safe=off&scoring=d&selm=bfd91813.0408311406.21d2bb89%40posting.goo
=1>queries or results ?</a><font size=-1><br> <b>...</b>
Yet when making a query, via the <b>perl</b> Net::Google module, s
max_results to 50 works fine and returns 50 results, which was not
<b>...</b> <br><font color=green><a href=/https/www.scribd.com/groups?hl=en&
lr=&c2coff=1&safe=off&group=google.public.web-apis class=a>google.
web-apis</a> - Aug 31, 2004 by sean - <a href=/https/www.scribd.com/groups?hl=en&lr=&c2
&safe=off&threadm=bfd91813.0408311406.21d2bb89%40posting.google.co
&prev=/groups%3Fq%3Dperl%2Bgroup:google.public.web-apis%26hl%3Den%
%26safe%3Doff%26sa%3DG%26scoring%3Dd class=a>View Thread (1 articl
A s with the H T M L example given for G oogle N ews in [Hack #54],
this might be utter gobbledygook for s ome of you. T hos e of you
with an unders tanding of the c ode below s hould s ee why the
regular expres s ion matc hing was written in the way it was .
4.16.1. The Code
Save the following c ode as groups 2cs v.pl:
#!/usr/bin/perl
# groups2csv.pl
# Google Groups results exported to CSV suitable for import into E
# Usage: perl groups2csv.pl < groups.html > groups.csv
# The CSV Header.
print qq{"title","url","group","date","author","number of articles
# The base URL for Google Groups.
my $url = "http://groups.google.com";
# Rake in those results.
my($results) = (join '', <>);
# Perform a regular expression match to glean individual results.
while ( $results =~ m!<a href=(/groups[^\>]+?rnum=[0-9]+)>(.+?)</a
<br>(.+?)<br>.*?<a href="?/groups.+?class=a>(.+?)</a> - (.+?) by
(.+?)\s+.*?\(([0-9]+) article!mgis ) {
my($path, $title, $snippet, $group, $date, $author, $articles)
($1||'',$2||'',$3||'',$4||'',$5||'',$6||'',$7||'');
$title =~ s!"!""!g; # double escape " marks
$title =~ s!<.+?>!!g; # drop all HTML tags
print qq{"$title","$url$path","$group","$date","$author","$art
4.16.2. Running the Hack
Run the s c ript from the c ommand line ["H ow to Run the H ac ks "
in the P refac e], s pec ifying the G oogle G roups res ults filename
that you s aved earlier and the name of the C SV file that you wis h
to c reate or to whic h you wis h to append additional res ults . For
example, us e groups .html as your input and groups .cs v as your
output:
$ perl groups2csv.pl <
groups.html
>
groups.csv
L eaving off the > and C SV filename s ends the res ults to the
s c reen for your perus al.
U s ing >> before the C SV filename appends the c urrent s et of
res ults to the C SV file, c reating it if it does n't already exis t. T his
is us eful for c ombining more than one s et of res ults , repres ented
by more than one s aved res ults page:
$ perl groups2csv.pl
<
results_1.html
>
results.csv
$ perl groups2csv.pl
<
results_2.html
>>
results.csv
4.16.3. The Results
Sc raping the res ults of a s earc h for perl group:google.public.web-
apis for anything mentioning the P erl programming language on
the G oogle A P I 's dis c us s ion forum looks like this :
$ perl groups2csv.pl < groups.html
"title","url","group","date","author","number of articles"
"queries or results ?","http://groups.google.com/groups?q=perl+gro
web-apis&hl=en&lr=&c2coff=1&safe=off&scoring=d&selm=bfd91813.
0408311406.21d2bb89%40posting.google.com&rnum=1","google.public.we
2004","sean",
"1"
...
"Re: Whats the Difference between using the API and ordinary ... "
com/groups?q=perl+group:google.public.web-apis&hl=en&lr=&c2coff=1&
off&scoring=d&selm=882fdb00.0405052309.44fe831b%40posting.google.c
"google.public.web-apis","May 6, 2004","tonio","4"
...
Hack 58. Simplify Google Groups URLs
If the Google Groups URLs are a little too unwieldy, the Google
Groups Simplif ier will cut them down to size.
G oogle G roups c an produc e s ome rather abominable U RL s for
individual pos ts . O ne mes s age c an generate a U RL along the
likes of this :
http://groups.google.com/groups?q=O%27reilly+%22mac+os+x%22
&hl=en&lr=&ie=UTF-8&oe=utf-8&scoring=d
&selm=ujaotqldn50o04%40corp.supernews.com&rnum=37
T his is a diffic ult U RL to s ave and referenc en ot to mention to
email to a c olleague.
U RL- s hortening s ervic es generate unique c odes for eac h U RL
provided, allowing extremely long U RL s to be c ompres s ed into
muc h s horter, unique U RL s . For example, Yahoo! N ews U RL s c an
be terribly long, but with T inyU RL , they c an be s hortened to
s omething like http://tinyurl.c om/2 ph8 .
T hes e U RL s are not private, s o don't treat them as s u
whac king (http://marnanel.org/writing/tinyurl- whac kin
making up T inyU RL s to find s ites other people have fe
s ys tem.
D on't us e thes e s ervic es unles s you abs olutely have
obs c ure the origin of the U RL , making the U RL s diffic u
res earc h. T hey do c ome in handy if you have to referen
c ac hed by G oogle. For example, here's a U RL for a c ac
of http://www.oreilly.c om: http://2 1 6 .2 3 9 .3 9 .1 0 0 /s ea
q=c ac he:T b0 F_6 2 2 vaY C :www.oreilly.c om/+oreilly&hl=
8 . While it's not as long as a typic al G oogle G roups me
it's long enough to be diffic ult to pas te into an email an
dis tribute.
T inyU RL (http://www.tinyurl.c om)
Shortens U RL s to 2 3 c harac ters . A bookmarklet is
available. T he s ervic e c onverted the G oogle G roups U RL
at the beginning of this hac k to http://tinyurl.c om/1 8 0 q.
M akeA ShorterL ink (http://www.makeas horterlink.c om)
Shortens U RL s to about 4 0 c harac ters , whic h, when
c lic ked, take the brows er to gateway pages with details
of where they're about to be s ent, after whic h the
brows er is redirec ted to the des ired U RL .
M akeA ShorterL ink c onverted that G oogle G roups U RL to
http://makeas horterlink.c om/? A 2 FD 1 4 5 A 1 .
Shorl (http://www.s horl.c om)
I n addition to s hortening U RL s to about 3 5 c harac ters ,
Shorl trac ks c lic k- through s tatis tic s for the generated
U RL . T hes e s tats c an be ac c es s ed only by the pers on
who c reated the Shorl U RL us ing a pas s word generated
at the time. Shorl turned the G roups U RL above into
http://www.s horl.c om/jas omykuprys tu, with the s tats
page at http://s horl.c om/s tat.php?
id=jas omykuprys tu&pwd=jirafryvukomuha. N ote the
embedded pas s word (pwd=jirafryvukomuha).
Chapter 5. Add-Ons
H ac ks 5 9 - 7 0
H ac k 5 9 . Keep Tabs on Your Searc hes with G oogle
A lerts
H ac k 6 0 . A dd G oogle to Your Toolbar or D es ktop
H ac k 6 1 . G oogle Your D es ktop
H ac k 6 2 . G oogle with Bookmarklets
H ac k 6 3 . G oogle from Word
H ac k 6 4 . G oogle by E mail
H ac k 6 5 . G oogle by I ns tant M es s enger
H ac k 6 6 . G oogle from I RC
H ac k 6 7 . G oogle on the G o
H ac k 6 8 . V is it the G oogle L abs
H ac k 6 9 . Find O ut What G oogle T hinks ___ I s
H ac k 7 0 . T he Searc h E ngine Belt Buc kle
Hacks 59-70
G oogle is of the Web, but that does n't mean that it's trapped in
your brows er. G oogle has bec ome s o muc h a part of the fabric of
our everyday lives that it s hows up jus t about everywhere.
G oogle via email [Hack #64], through ins tant mes s aging [Hack
#65], from a c hat room [Hack #66], on your mobile phone [Hack
#67]; you c an even take it out for a night on the town in the form
of a groovy belt buc kle [Hack #70].
T his c hapter is a tour of s ome of the more interes ting ways in
whic h G oogle has leapt out of the pages of c ybers pac e and into
what hac kers affec tionately c all meat s pace: everyday life, to you
and me.
Hack 59. Keep Tabs on Your Searches
with Google Alerts
Receive alerts in your email Inbox or RSS reader when
something you're af ter makes its way into the Google Web
index or a Google News story.
T here are two c las s es of s earc h that one generally runs in
G oogle. O ne is of the s ort that you generally run jus t the onc e:
you're trying to find information on s ome topic , a phone number,
or that U RL you vis ited yes terday but have s inc e forgotten.
T hen there's the s earc h that you'd run every day if you c ould.
You're interes ted in a partic ular s ubjec t matter and want to know
the moment G oogle finds and indexes s omething new on the
topic .
T here are a c ouple of s ervic es available that'll do the tric k: the
offic ial G oogle A lerts notifies you of any new web pages or news
s tories matc hing your s earc h c riteria, while the third- party
s ervic e G oogleA lert watc hes only for new web pages but s ports
a few extra features and delivery options not found in G oogle's
vers ion.
G oogle's Web index does not c ons ider a
page "new" bas ed on the date it was
c reated. I ns tead, it c ons iders a page new
bas ed on the date that it was found and
indexed by the G ooglebot. For more detail
on the differenc e, s ee "daterange:" under
the "Spec ial Syntax" s ec tion in C hapter 1 .
5.2.1. Google Alerts
G oogle A lerts (http://www.google.c om/alerts ), G oogle's offic ial
alert offering, allows you to monitor both G oogle's Web index and
G oogle N ews s tories . To s et up a G oogle A lert, vis it the G oogle
A lerts page. I n the C reate a G oogle A lert form (s hown in Figure
5 - 1 ), type in a s earc h query and c hoos e whether to monitor
news , the Web, or both.
Figure 5-1. Monitor Google's Web Index and
Google News stories with Google Alerts
You have a c hoic e when it c omes to how often you're notified: as
it happens , onc e a day, or onc e a week. P rovide your email
addres s , c lic k the C reate A lert button, and you'll rec eive a
c onfirmation email mes s age a few moments later. Follow the link
provided in the email mes s aget hus c onfirming that your email
addres s is legitimate and that it was you who reques ted the
G oogle A lerta nd you're all s et.
Be c areful of the update frequenc y option;
monitoring G oogle N ews ' 4 ,5 0 0 s ourc es
for even a s lightly c ommon word, phras e,
or name and c hoos ing to rec eive
notific ation "as it happens " c an fill your
inbox with an avalanc he of email.
E ac h alert that you rec eive inc ludes your s earc h query, the
found page's title, a s nippet of c ontent, and the U RL (for web
index res ults ) or s tory title, U RL , des c ription, and s ourc e (for
news s tories ). You c an s et up to 5 0 alerts per email addres s .
While all you need to s ign up for G oogle A lerts is a valid email
addres s , there's als o the option to s ign into G oogle for a more
hands - on approac h to managing your alerts . O n the G oogle
A lerts page, c lic k the "s ign in to manage your alerts " link.
You'll need to already have or s ign up for a free G oogle ac c ount.
But members hip has its privileges :
Signing in provides you with a nic e overview of your
ac tive alerts .
I f you don't s ign in to manage your G oogle A lerts , you
c an't edit the G oogle A lerts that you c reate. A ll you c an
do is delete them and c reate new ones .
G oogle A lerts are delivered in H T M L format as a default;
by s igning in, you c an s witc h to text and bac k again.
A G oogle ac c ount opens doors to other G oogle
properties s uc h as G oogle A ns wers
(http://ans wers .google.c om), allows you to c reate your
own G oogle G roups (http://groups .google.c om) Sec tion
4 .1 5 [H ac k #5 6 ], etc .
5.2.2. GoogleAlert
Before there were G oogle A lerts , there was G oogleA lert
(http://www.googlealert.c om), a third- party alert s ervic e built on
the G oogle A P I [C hapter 9 ]. Being built on the A P I , it is kept to
only the G oogle Web index and c an't monitor anything like
G oogle N ews or whatever els e G oogle dec ides to have the
offic ial G oogle A lerts c over in the near future.
H owever, G oogleA lert does offer a few features worth c hec king
out. Start by s igning up for a trial ac c ount, whic h affords you up
to three queries .
Full G oogleA lert ac c ounts aren't free; they
range from $ 4 .9 5 to $ 1 9 .9 5 per month,
affording you more s earc hes with more
extens ive delivery options .
G oogleA lert's A dvanc ed Searc h form, s hown in Figure 5 - 2 , is
s imilar to G oogle's A dvanc ed Searc h page
(http://www.google.c om/advanc ed_s earc h), allowing you to build
Boolean s earc hes , s earc h for exac t phras es , res tric t your
s earc hes to a partic ular s ite or date range, s pec ify the file types
that you're interes ted in, and morea ll without having to fiddle
with s pec ial s yntax.
Figure 5-2. GoogleAlert offers search extras like
case-sensitivity and delivery via RSS
You c an rec eive your alerts by email, of c ours e. But you c an als o
s ubs c ribe to them as a s yndic ated RSS feed, or have a page on
your weblog pinged by Trac kBac k
(http://www.movabletype.org/trac kbac k/beginners ) notific ation
any time s omething new c omes up.
M onitoring G oogle's Web index allows me to find s earc h engines
or direc tories of information that I might have mis s ed otherwis e.
I keep tabs on G oogle to find pages that don't tend to appear out
of thin air all that often: thos e c ontaining " online museum" or
" online reference service", for example.
I tend to us e broader s earc h queries when monitoring G oogle
N ews . While watc hing the G oogle Web index for " online database"
or " new search engine" might net me thous ands of res ults a nd
thos e long after the s ites were ac tually newo nline news s tories
about new online databas es and s earc h engines tend to c rop up
les s frequently and provide a higher s ignal- to- nois e ratio.
Hack 60. Add Google to Your Toolbar or
Desktop
Google f rom wherever you are without skipping a beat, thanks
to an assortment of browser search boxes, toolbars, and
desktop applications.
J us t bec aus e G oogle is a web s ite does n't mean that you have to
deal with it as s uc h. P ic ture this : you're in the zone, working on
that big projec t, brows er windows , s preads heets , and s lides
littering your des ktopb oth figuratively and literally. A t s ome
point you need to c hec k a fac t, find a s tatis tic , or read a news
s tory. N ow, you c ould open yet another brows er window, type
google.com, and s earc h the Web, but that's about two s teps too
many and (done repeatedly) may well dis rupt your flow.
Take, for ins tanc e, what happened in the mids t of writing this
hac k. U p popped an ins tant mes s age from a friend with a patent
number that he'd s tumbled ac ros s and that he thought I might
find interes ting. I c ould have opened another brows er window,
brows ed to http://www.google.c om, G oogled for " us patent
database", and s earc hed for that partic ular patent. I ns tead, I
pas ted that number into the G oogle Searc h box built right into
my Firefox web brows er (s hown in Figure 5 - 3 ), prefixed it with
patent, hit Return, and c lic ked the quic k link to the patent at the
U .S. P atent D atabas e ["Q uic k L inks : G oogle by N umbers " in
C hapter 1 ]. T his s ort of flow, des pite s aving only a s tep or two at
mos t, is s o c atc hy that it has bec ome an integral part of my
workflow.
Figure 5-3. The Firefox built-in search box
expands to talk to just about any search engine
By toolbar, we ac tually mean one of s everal
add- ons : s earc h boxes built into your web
brows er (as des c ribed earlier), toolbars
that attac h thems elves to your brows er
and provide s earc h and other c apabilities ,
and s earc h boxes you that you c an embed
els ewhere into your des ktop, tas kbar, or
toolbars .
T his hac k is a roundup of s ome of the G oogle toolbars that you'll
find available from G oogle its elf or from third- party developers
(all of them are free for the taking.)
5.3.1. Browser Search Boxes
T he open s ourc e M ozilla
(http://www.mozilla.org/produc ts /mozilla1 .x), Firefox
(http://www.mozilla.org/produc ts /firefox), and N ets c ape
(http://c hannels .nets c ape.c om/ns /brows ers ) brows ers , as well
as the M ac O S X Safari (http://www.apple.c om/s afari) and O pera
Software's O pera (http://www.opera.c om) brows ers , all s port a
s earc h box either in the toolbar its elf or a s idebar like the one
s hown in Figure 5 - 4 . M os t often, too, the s earc h box c an be
c onfigured to redirec t your s earc hes to any number of s earc h
engines us ing a drop- down menu or preferenc e s etting.
Figure 5-4. The Mozilla Search sidebar is always
within easy reach
T he M ozilla s earc h func tionality s hown in Figure 5 - 4 c an be
reac hed by various means : typing a s earc h into the addres s bar
and c lic king the Searc h button, s elec ting V iew Sidebar and
c hoos ing the Searc h tab, or highlighting a word in any web page,
right- c lic king it, and s elec ting "Searc h Web for ..."
5.3.2. The Official Google Toolbar
T he offic ial G oogle Toolbar (http://toolbar.google.c om; Windows
only) goes s o muc h further than a s imple s earc h box for G oogle
and s o is highly rec ommended for Windows us ers . I t is
c ons tantly updated, s ometimes with G oogle s lipping in little
E as ter eggs (read: s urpris e features ) jus t for fun. Figure 5 - 5
s hows the G oogle Toolbar in ac tion.
Figure 5-5. The official Google Toolbar; don't be
fooled by imposters
Searc h the Web, the s ite you're c urrently vis iting, by c ountry
domain, or us e any of the various G oogle properties , inc luding
G oogle I mages , D irec tory, N ews , Froogle, and s o forth. T here are
even voting buttons (the happy- and s ad- fac e ic ons ) if you feel
like letting G oogle know what you think of the page you're on.
A nd the offic ial toolbar is the only vers ion that s ports P ageRank
["T he M ys terious P ageRank" in C hapter 8 ] for the page you're
c urrently vis iting. A dditional features inc lude pop- up bloc king,
auto- fill for web forms , highlighting of s earc h keywords in
res ulting pages , and more.
5.3.3. The Mozilla Googlebar
M ozilla/Firefox us ers (whether on Windows , M ac , or U nix/L inux)
c an't us e the G oogle Toolbar, but the G ooglebar
(http://googlebar.mozdev.org) is a rather ac c eptable s ubs titute.
I t looks and ac ts very muc h like the offic ial vers ion, attempting
to matc h any func tionality that it c an (s ee Figure 5 - 6 ). I t
does n't, however, provide P ageRank.
Figure 5-6. The Mozilla Googlebar mimics the
official offering as much as possible
M os t people don't c are one whit about P ageRank. But
webmas ters and res earc hers c ons ider P ageRank a c ritic al
indic ator of the importanc e of a partic ular web s ite or individual
page. I f you're a M ozilla/Firefox us er and jus t c an't live without
P ageRank, P RG ooglebar (http://www.prgooglebar.org) is a
modific ation of the G ooglebar that inc orporates P ageRank. A nd
then there's als o the G oogle P ageRank extens ion
(http://www.tapouillo.c om/firefox_extens ion), embedding
P ageRank into the M ozilla/Firefox s tatus bar.
5.3.4. Desktop Search Boxes
A nd then there are applic ations that live not in the brows er, but
on your des ktop, in your tas kbar or toolbar, or behind a right-
mous e- c lic k.
G ophoria (http://www.gophoria.c om; Windows only) turns any
text, U RL , or image into a right- c lic kable s earc h. H ighlight s ome
text, right c lic k, and s elec t "gophoria s earc h" from the c ontext
menu.
T he offic ial G oogle D es kbar (http://toolbar.google.c om/des kbar;
Windows only) affords you a quic k and s imple interfac e to G oogle
Web Searc h, G oogle I mages , G oogle N ews , and Froogle. Res ults
s how up in a s mall preview window that looks like the familiar
G oogle res ults page.
D ave's Q uic k Searc h D es kbar (http://dqs d.net; Windows only),
as you may have gues s ed from the name, s its in your des kbar or
tas kbar. But the s imple name belies s ome inc redible
func tionality, not the leas t of whic h is s pec ial triggers and
s witc hes . E nter a query in the box and hit the E nter key on your
keyboard to s earc h G oogle; res ults pop up in your brows er.
Feeling luc ky? A dd an exc lamation mark (!) to your query (e.g.,
" washington post"!) and you'll be taken s traight to the top ranked
res ult. T he # tac ked on to George Bush (tx)# performs a
phonebook lookup. & dives into the G oogle c ac he. A nd there are
tons more; pres s F1 for a full lis t, s hown in Figure 5 - 7 .
Figure 5-7. Dave's Quick Search Deskbar
provides no-nonsense search shortcuts
G G Searc h (http://www.frys ianfools .c om/ggs earc h; Windows
only) is one of the more gorgeous toolbars I 've s een, with
s everal themes and s kins . T here are many s earc h tools here,
ac c es s ible via a s eries of menus . I t's rather mind- bending to
s ee G oogle's s imple interfac e s lic ed and dic ed into menu item
within menu item (Figure 5 - 8 ), but it's a flexible, us eful tool,
nevertheles s .
Figure 5-8. GGSearch slices and dices Google's
search options
M ac O S X's built- in Searc h with G oogle s ervic e (M ac only ;- ) )
puts G oogle only a three- finger key c ombo away. H ighlight any
text in any O S X Servic e- aware applic ation (e.g., not a M ic ros oft
O ffic e for M ac O S X produc t) and s elec t Servic es Searc h
with G oogle from the applic ation menu (i.e., Finder if you you're
in the Finder, Safari if you're in Safari) or type C ommand- Shift- L
and you'll G oogle for the highlighted words . T he res ults appear in
your brows er window.
Hack 61. Google Your Desktop
Google your desktop and the rest of your f ilesystem, mailbox,
and instant messenger conversationseven your browser cache.
N ot c ontent jus t to help you find things on the I nternet, G oogle
takes on that teetering pile on your des ktopy our c omputer's
des ktop, that is .
T he G oogle D es ktop (http://des ktop.google.c om) is your own
private little G oogle s erver. I t s its in the bac kground, s logging
through your files and folders , indexing your inc oming and
outgoing email mes s ages , lis tening in on your ins tant
mes s enger c hats , and brows ing the Web right along with you.
J us t about anything you s ee and s ummarily forget, the G oogle
D es ktop s ees and memorizes : it's like a photographic memory
for your c omputer.
A nd it operates in real time.
Beyond the initial s weep, that is . When you firs t ins tall G oogle
D es ktop, it makes us e of any idle time to meander your
files ys tem, email applic ation, ins tant mes s ages , and brows er
c ac he. I mbued with a s ens e of politenes s , the indexer s houldn't
interfere at all with your us e of your c omputer; it only s prings
into ac tion when you s tep away, take a phone c all, or doze off for
3 0 s ec onds or more. P ic k up the mous e or touc h the keyboard
and the G oogle D es ktop s c uttles off into the c orner, waiting
patiently for its next opportunity to look around.
I ts initial inventory taken, the G oogle D es ktop s erver s its bac k
and waits for s omething of interes t to c ome along. Send or
rec eive an email mes s age, s trike up an A I M c onvers ation with a
friend, or get a s tart on that P owerP oint pres entation and it'll be
notic ed and indexed within s ec onds .
T he G oogle D es ktop full- text indexes :
Text files , M ic ros oft Word doc uments , E xc el workbooks ,
and P owerP oint pres entations living on your hard drive
E mail handled through O utlook or O utlook E xpres s
A O L I ns tant M es s enger (A I M ) c onvers ations
Web pages brows ed in I nternet E xplorer
A dditionally, any other files you have lying aboutp hotographs ,
M P 3 s , movies a re indexed by their filename. So while the G oogle
D es ktop c an't tell a portrait of U nc le A lfred (uncle_alfred.j pg)
from a s ong by "U nc le C rac ker" (uncle_cracker__double_wide_
_who_s _your_uncle.mp3), it'll file both in a s earc h for uncle.
A nd the point of all this is to make your c omputer s earc hable
with the eas e, s peed, and familiar interfac e you've c ome to
expec t of G oogle. T he G oogle D es ktop has its own home page on
your c omputer, s hown in Figure 5 - 9 , whether you're online or not.
Type in a s earc h query jus t like you would at G oogle proper and
c lic k the Searc h D es ktop button to s earc h your pers onal index.
O r c lic k Searc h the Web to s end your query out to G oogle.
Figure 5-9. The Google Desktop home page
But we're getting a little ahead of ours elves here.
L et's take a few s teps bac k, download and ins tall the G oogle
D es ktop, and work our way bac k to s earc hing again.
5.4.1. Installing the Google Desktop
T he G oogle D es ktop is a Windows - only applic ation, requiring
Windows XP or Windows 2 0 0 0 Servic e P ac k 3 or later. T he
applic ation its elf is tiny, but it'll c ons ume about 5 0 0 M B of room
on your hard drive and works bes t with 4 0 0 M H z of c omputing
hors epower and 1 2 8 M B of memory.
P oint your brows er at http://des ktop.google.c om, download, and
run the G oogle D es ktop ins taller. I t'll ins tall the applic ation,
embed a little s wirly ic on in your tas kbar, and drop a s hortc ut
onto your des ktop. When it's finis hed ins talling and s etting its elf
up, your default brows er pops open and you're as ked to s et a few
preferenc es , as s hown in Figure 5 - 1 0 .
Figure 5-10. Set Google Desktop search
preferences
C lic k the Set P referenc es and C ontinue button and you'll be
notified that the G oogle D es ktop is s tarting its initial indexing
s weep. C lic k the Start Searc hing button to get to the G oogle
D es ktop home page (Figure 5 - 9 ).
5.4.2. Searching Your Desktop
From here on out, any time you're looking for s omething on your
c omputer, rather than invoking Windows s earc h and waiting
impatiently while it grinds away (and you grind your teeth) and
returns with nothing, double- c lic k the s wirly G oogle D es ktop
tas kbar ic on and G oogle for it. D on't bother c ombing through an
endles s array of I nboxes , O utboxes , Sent M ail, and folders or
wis hing you c ould remember whether your A I M buddy s ugges ted
s tarving or feeding your c old. C lic k the s wirl.
Figure 5 - 1 1 s hows the res ults of a G oogle D es ktop s earc h for
hacks. N otic e that it found 1 6 email mes s ages , 2 files , 1 c hat,
and 1 item in my I E brows ing his tory matc hing my hac ks query.
A s you c an probably gues s from the ic ons to the left of eac h
res ults , the firs t three are an A I M c hat, H T M L file (mos t likely
from my brows er's c ac he), and an email mes s age. T hes e are
s orted by date, but you c an eas ily make a s witc h to relevanc e by
c lic king the "Sort by relevanc e" link at the top- right of the
res ults lis t.
Figure 5-11. Google Desktop search results
Figures Figure 5 - 1 2 , Figure 5 - 1 3 , and Figure 5 - 1 4 s how eac h of
thes e individual s earc h res ults as I c lic ked through them. N ote
that eac h is dis played in a manner appropriate to the c ontent.
C lic k the "C hat with..." link s hown in Figure 5 - 1 2 to launc h an
A I M c onvers ation with the pers on at hand.
Figure 5-12. An AIM instant message
C ac hed pages are pres ented, as s hown in Figure 5 - 1 3 , in muc h
the s ame manner as they are in the G oogle c ac he.
Figure 5-13. A cached web page
T he various Reply, Reply to A ll, Forward, etc ., links as s oc iated
with an individual mes s age res ult (Figure 5 - 1 4 ) work: c lic k them
and the appropriate ac tion will be taken by O utlook or O utlook
E xpres s .
Figure 5-14. An email message
5.4.3. Google Desktop Search Syntax
I t jus t wouldn't be a G oogle s earc h interfac e if there weren't
s pec ial s earc h s yntax to go along with it.
T he Boolean O R works as expec ted (e.g., hacks OR snacks), as
does negation (e.g., hacks -evil).
A filetype: operator res tric ts s earc hes to only a partic ular type
of file: filetype:powerpoint or filetype:ppt (.ppt being the
P owerP oint file extens ion) both find only M ic ros oft P owerP oint
files while filetype:word or filetype:doc (.doc being the Word file
extens ion) both res tric t res ults to M ic ros oft Word doc uments .
5.4.4. Searching the Web
N ow you'd think I 'd hardly need to c over G oogling ... and you'd
be right. But there's a little more to googling via the G oogle
D es ktop than you might expec t. Take a c los e look at the res ults
of a G oogle s earc h for hacks s hown in Figure 5 - 1 5 .
Figure 5-15. Google Desktop Web Search
results pack a little extra
C ome on bac k when you're through with that double take.
I f you mis s ed it, notic e the new quic k links ["Q uic k L inks " in
C hapter 1 ]: "2 7 res ults s tored on your c omputer."
Yes , thos e are the s ame res ults (and then s ome, given my
indexer was hard at work) returned in my earlier G oogle D es ktop
Searc h of my loc al mac hine. A s an added reminder, they're
c alled out by that G oogle D es ktop s wirl. C lic k a loc al res ult and
you'll end up in jus t the s ame plac e as before: all 2 7 res ults , an
H T M L page, or M ic ros oft Word doc ument. C lic k any other quic k
link or s earc h res ult and they'll ac t in the manner that you'd
expec t from any G oogle.c om res ults .
5.4.5. Behind the Scenes
N ow before you s tart worrying about the res ults of a loc al
s earc ho r indeed your loc al files b eing s ent off to G oogle, read on.
What's ac tually going on is that the loc al G oogle D es ktop s erver
is interc epting any G oogle Web s earc hes , pas s ing them on to
G oogle.c om in your s tead, and running the s ame s earc h agains t
your c omputer's loc al index. I t's then interc epting the Web
s earc h res ults as they c ome bac k from G oogle, pas ting in loc al
finds , and pres enting it to you in your brows er as a c ohes ive
whole.
A ll work involving your loc al data is done on your c omputer.
N either your filenames nor your files thems elves are ever s ent
on to G oogle.c om.
For more on G oogle D es ktop and privac y, right- c lic k the G oogle
D es ktop tas kbar s wirl, s elec t A bout, and c lic k the P rivac y link.
5.4.6. Twiddling Knobs and Setting
Preferences
T here are various knobs to twiddle and preferenc es to s et
through the G oogle D es ktop brows er- bas ed interfac e and
tas kbar s wirl.
Set various preferenc es in the G oogle D es ktop P referenc es
page. C lic k the D es ktop P referenc es link on the G oogle D es ktop
home page or any res ults page to bring up the s ettings s hown in
Figure 5 - 1 6 .
Figure 5-16. Google Desktop Preferences
H ide your loc al res ults from s ight when s haring G oogle Web
Searc h res ults with a friend or c olleague by c lic king the H ide link
next to any vis ible G oogle D es ktop quic k links . You c an als o
turn D es ktop quic k link res ults on and off from the G oogle
D es ktop P referenc es page.
C lic k the "Remove res ults " link next to the Searc h D es ktop
button on the top- right of any res ults page and you'll be able to
go through and remove partic ular items from G oogle D es ktop
index, as s hown in Figure 5 - 1 7 . D o note that if you open or view
any of thes e items again, they'll onc e again be indexed and s tart
s howing up in s earc h res ults .
Figure 5-17. Removing items from your Google
Desktop index
Searc h, s et preferenc es , c hec k the s tatus of your index, paus e
or res ume indexing, quit G oogle D es ktop, or brows e the "A bout"
doc s by right- c lic king the G oogle D es ktop tas kbar s wirl and
c hoos ing an item from the menu, s hown in Figure 5 - 1 8 .
Figure 5-18. The Google Desktop taskbar menu
gets you to knobs to twiddle and preferences to
set
When evaluating the G oogle D es ktop as an interfac e to finding
needles in my pers onal hays tac k, one thing s tic ks in my mind: I
s tumbled ac ros s an old email mes s age that I was s ure I 'd los t.
5.4.7. See Also
T he G oogle D es ktop P roxy
(http://www.projec tc omputing.c om/res ourc es /des ktopP rox
takes des ktop s earc hing beyond your own des ktop. A
little proxy s erver s itting on your c omputer ac c epts
queries from other mac hines on the network, pas s es
them to the G oogle D es ktop engine running loc ally, and
forwards the res ults on.
Hack 62. Google with Bookmarklets
Create interactive bookmarklets to perf orm Google f unctions
f rom the comf ort of your own browser.
You probably know what bookmarks are. But what are
bookmarklets ? Bookmarklets are like bookmarks but with an
extra bit of J avaSc ript magic added. T his makes them more
interac tive than regular bookmarks ; they c an perform s mall
func tions like opening a window, grabbing highlighted text from a
web page, or s ubmitting a query to a s earc h engine. T here are
s everal bookmarklets that allow you to perform us eful G oogle
func tions right from the c omfort of your own brows er.
I f you're us ing I nternet E xplorer for
Windows , you're in gravy: all thes e
bookmarklets will mos t likely work as
advertis ed. But if you're us ing a les s -
apprec iated brows er (s uc h as O pera) or
operating s ys tem (s uc h as M ac O S X), pay
attention to the bookmarklet requirements
and ins truc tions ; there may be s pec ial
magic needed to get a partic ular bookmark
working, or indeed, you may not be able to
us e the bookmarklet at all.
G oogle Trans late!
(http://www.mic roc ontentnews .c om/res ourc es /trans lator.htm)
P uts G oogle's trans lation tools into a bookmarklet,
enabling one- button trans lation of the c urrent web page.
G oogle J ump
(http://www.angelfire.c om/dc /dc bookmarkletlab/Bookmarklets /s c ri
P rompts you for s earc h terms , performs a G oogle
s earc h, and takes you s traight to the top hit thanks to
the magic of G oogle's I 'm Feeling L uc ky func tion.
T he D ooyoo Bookmarklets http://dooyoo-
uk.tripod.c om/bookmarklets 2 .html c ollec tion
Features s everal bookmarklets for us e with different
s earc h engines t wo for G oogle. Similar to G oogle's
Brows er Buttons , one finds highlighted text and the other
finds related pages .
J oe M aller's Trans lation Bookmarkets
(http://www.joemaller.c om/trans lation_bookmarklets .s html)
Trans late the c urrent page into the s pec ified language
via G oogle or A ltaV is ta.
Bookmarklets for O pera
(http://www.philburns .c om/bookmarklets .html)
I nc ludes a G oogle trans lation bookmarklet, a G oogle
bookmarklet that res tric ts s earc hes to the c urrent
domain, and a bookmarklet that s earc hes G oogle
G roups . A s you might imagine, thes e bookmarklets were
c reated for us e with the O pera brows er.
L uc kyM arklets
(http://www.res earc hbuzz.org/arc hives /0 0 1 4 1 4 .s html)
Tara's bookmarklets taking advantage of the I 'm Feeling
L uc ky feature in G oogle Web Searc h, G oogle N ews , and
G oogle I mages .
M illy's Bookmarklets (http://www.imilly.c om/bm.htm)
A n inc redible c ollec tion of bookmarklets for all things
G oogle: Web Searc h, I mages , D irec tory, D efinitions ,
C ac he, the G oogle s ite its elf, and many more, G oogle or
otherwis e.
Hack 63. Google from Word
A dd a little Google to Microsof t Word.
You probably us e G oogle a few dozen times a day. I f you work a
lot within M ic ros oft Word, us ing G oogle us ually means s witc hing
over to your web brows er, c hec king the res ults , and then going
bac k to Word. T his hac k will s how you how to dis play the s earc h
res ults in Word's N ew D oc ument Tas k P ane.
T his hac k us es a plain text .ini file to s tore data and s ome
V is ual Bas ic for A pplic ations (V BA ) c ode that als o us es
V BSc ript regular expres s ions .
T his hac k will work only with Word 2 0 0 3
for Windows .
U s ing G oogle from Word requires a bit of s etup, but onc e you've
ins talled the appropriate tools , you c an us e G oogle from within
any Word mac ro.
5.6.1. Install the Web Services Toolkit
Firs t, ins tall the free M ic ros oft O ffic e 2 0 0 3 Web Servic es Toolkit
2 .0 1 . Searc h for it on the M ic ros oft web s ite
(http://www.mic ros oft.c om/downloads /) or G oogle it.
5.6.2. Create a New Template
N ext, c reate a new template to hold your G oogle- related mac ros .
T he Web Servic es Toolkit will c reate s ome c ode s o that you c an
work with G oogle. A s eparate template will help you keep trac k
of the c ode. C reate a new, blank doc ument and s ave it as a
D oc ument Template named GoogleTools .dot.
5.6.3. Install the Google Interface VBA
Code
From your new GoogleTools .dot template, s elec t Tools M ac ro
V is ual Bas ic E ditor. T he Web Servic es Toolkit will have
added a new item c alled Web Servic e Referenc es on the Tools
menu, as s hown in Figure 5 - 1 9 .
Figure 5-19. Creating a new reference for
accessing Google
Selec t Tools Web Servic e Referenc es to dis play the dialog
s hown in Figure 5 - 2 0 . E nter google in the Keywords field and
c lic k the Searc h button. When the web s ervic e is found, c hec k
the box next to it, and c lic k the A dd button.
Figure 5-20. Locating the Google search web
service
When you c lic k the A dd button, you'll notic e a flurry of ac tivity
on your s c reen as the Web Servic es Toolkit ins talls s everal new
c las s modules into your template projec t, as s hown in Figure 5 -
21.
Figure 5-21. The code created by the Web
Services Toolkit
T he Web Servic es Toolkit c reates the
c ode, but it ac tually c omes from G oogle
us ing Web Servic es D es c ription L anguage
(WSD L ). T he Toolkit interprets this
information and generates the V BA c ode
needed to ac c es s the web s ervic ei n this
c as e, G oogle.
5.6.4. The Code
With the GoogleTools .dot template you c reated open, s elec t Tools
M ac ro M ac ros and ins ert the following c ode, whic h
c ons is ts of a proc edure named GoogleToTaskPane and a s upporting
func tion named StripHTML.
M ake s ure you replac e the value insert
key here with your G oogle A P I developer's
key.
Sub GoogleToTaskPane( )
Dim vSearchResults As Variant
Dim v As Variant
Dim sResults As String
Dim sEntryName As String
Dim sEntryURL As String
Dim sLogFile As String
Dim sSearchDisplayTitle As String
Dim sSearchURL As String
Dim i As Integer
' Google API variables
Dim sGoogleAPIKey As String
Dim sSearchQuery As String
Dim lStart As Long
Dim lMaxResults As Long
Dim bFilter As Boolean
Dim sRestrict As String
Dim bSafeSearch As Boolean
Dim sLanguageRestrict As String
Dim sInputEncoding As String
Dim sOutputEncoding As String
Dim google_search As New clsws_GoogleSearchService
' Initialize variables
sLogFile = "C:\google_taskpane.ini"
sGoogleAPIKey = "insert your key"
lStart = 1
lMaxResults = 10
bFilter = True
sRestrict = ""
bSafeSearch = False
sLanguageRestrict = ""
sInputEncoding = "UTF-8"
sOutputEncoding = "UTF-8"
' Hide the Task Pane
Application.CommandBars("Task Pane").Visible = False
' Remove existing items from New Document Task Pane
For i = 0 To 9
sEntryURL = System.PrivateProfileString( _
FileName:=sLogFile, _
Section:="GoogleTaskPane", _
Key:="URLName" & CStr(i))
sEntryName = System.PrivateProfileString( _
FileName:=sLogFile, _
Section:="GoogleTaskPane", _
Key:="EntryName" & CStr(i))
If Len(sEntryURL) > 0 Then
Application.NewDocument.Remove _
FileName:=sEntryURL, _
Section:=msoBottomSection, _
DisplayName:=sEntryName, _
Action:=msoOpenFile
End If
Next i
' Get new search query
sSearchQuery = InputBox("Enter a Google query:")
If Len(sSearchQuery) = 0 Then Exit Sub
' Get search results
vSearchResults = google_search.wsm_doGoogleSearch( _
str_key:=sGoogleAPIKey, _
str_q:=sSearchQuery, _
lng_start:=lStart, _
lng_maxResults:=lMaxResults, _
bln_filter:=bFilter, _
str_restrict:=sRestrict, _
bln_safeSearch:=bSafeSearch, _
str_lr:=sLanguageRestrict, _
str_ie:=sInputEncoding, _
str_oe:=sOutputEncoding).resultElements
' Check for no results
On Error Resume Next
v = UBound(vSearchResults)
If Err.Number = 9 Then
MsgBox "No results found"
Exit Sub
ElseIf Err.Number <> 0 Then
MsgBox "An error has occurred: " & _
Err.Number & vbCr & _
Err.Description
Exit Sub
End If
' Add each result to the task pane
' and to the log file
i = 0
For Each v In vSearchResults
sSearchURL = v.URL
sSearchDisplayTitle = StripHTML(v.title)
Application.NewDocument.Add _
FileName:=sSearchURL, _
Section:=msoBottomSection, _
DisplayName:=sSearchDisplayTitle, _
Action:=msoOpenFile
System.PrivateProfileString( _
FileName:=sLogFile, _
Section:="GoogleTaskPane", _
Key:="URLName" & CStr(i)) = sSearchURL
System.PrivateProfileString( _
FileName:=sLogFile, _
Section:="GoogleTaskPane", _
Key:="EntryName" & CStr(i)) = sSearchDisplayTitle
i = i + 1
Next v
' Show the New Document Task Pane
CommandBars("Menu Bar").Controls("File").Controls("New...").Execut
End Sub
Function StripHTML(str As String) As String
Dim re As Object
Dim k As Long
On Error Resume Next
Set re = GetObject(Class:="VBScript.RegExp")
If Err.Number = 429 Then
Set re = CreateObject(Class:="VBScript.RegExp")
Err.Clear
ElseIf Err.Number <> 0 Then
MsgBox Err.Number & vbCr & Err.Description
End If
' Check for common character entities by ASCII value
For k = 33 To 255
re.Pattern = "&#" & k & ";"
str = re.Replace(str, Chr$(k))
Next k
' Remove common HTML tags
re.Pattern = "<[^>]+?>|&[^;]+?;"
re.Global = True
str = re.Replace(str, vbNullString)
StripHTML = str
End Function
T his hac k us es two parts of the G oogle s earc h res ults : the U RL s
and titles . G oogle formats the s earc h res ult title as H T M L , but
you c an only put plain text in the Tas k P ane. T he StripHTML
func tion us es a few s imple V BSc ript Regular E xpres s ions to
s trip out c ommon H T M L tags (s uc h as <b>) and replac e
c harac ter entities (s uc h as @) with their A SC I I c harac ter
equivalents .
I t c an be tric ky to remove files from the Tas k P ane us ing V BA
unles s you know their exac t name. T his mac ro, however, s tores
the s earc h res ults in a plain- text .ini file. T he next time you do a
s earc h, you c an eas ily remove the previous res ults . T he mac ro
us es a file named C:\google_tas kpane.ini, whic h is defined in the
GoogleToTaskPane proc edure.
5.6.5. Running the Hack
A fter you ins ert the c ode, s witc h bac k to Word. N ext, s elec t
Tools M ac ro M ac ros , c hoos e G oogleToTas kP ane, and
c lic k the Run button to dis play the dialog s hown in Figure 5 - 2 2 .
Figure 5-22. Entering a Google search that will
display in the Task Pane
E nter your s earc h terms and c lic k the O K button. T he N ew
D oc ument Tas k P ane appears and dis plays the s earc h res ults ,
as s hown in Figure 5 - 2 3 . H over your mous e over any of the
entries to dis play the U RL . C lic k a U RL to open the s ite in your
web brows er.
Figure 5-23. Google results displayed in the
Task Pane
E very time you run a s earc h, the mac ro removes the previous
res ults from the Tas k P ane. I f you want to remove the previous
res ults without dis playing new ones , c lic k the C anc el button in
the dialog box s hown in Figure 5 - 2 2 .
To make s ure this handy mac ro loads
automatic ally when Word s tarts , put
GoogleTools .dot into your Startup folder,
typic ally C:\Documents and Setting\
<username>\Application
Data\Micros oft\Word\STARTUP\.
5.6.6. Hacking the Hack
To take this hac k one s tep further, you c an modify it to us e the
c urrently s elec ted text as the s earc h text, rather than dis playing
an input box and entering text.
T he following mac ro, named GoogleSelectionToTaskPane, does a
G oogle s earc h of the c urrently s elec ted text and dis plays the
res ults in the Tas k P ane. T he modified c ode is s hown in bold.
Sub GoogleSelectionToTaskPane( )
Dim vSearchResults As Variant
Dim v As Variant
Dim sResults As String
Dim sEntryName As String
Dim sEntryURL As String
Dim sLogFile As String
Dim sSearchDisplayTitle As String
Dim sSearchURL As String
Dim i As Integer
' Google API variables
Dim sGoogleAPIKey As String
Dim sSearchQuery As String
Dim lStart As Long
Dim lMaxResults As Long
Dim bFilter As Boolean
Dim sRestrict As String
Dim bSafeSearch As Boolean
Dim sLanguageRestrict As String
Dim sInputEncoding As String
Dim sOutputEncoding As String
Dim google_search As New clsws_GoogleSearchService
' Initialize variables
sLogFile = "C:\google_taskpane.ini"
sGoogleAPIKey = your_key_here
lStart = 1
lMaxResults = 10
bFilter = True
sRestrict = ""
bSafeSearch = False
sLanguageRestrict = ""
sInputEncoding = "UTF-8"
sOutputEncoding = "UTF-8"
' Hide the Task Pane
Application.CommandBars("Task Pane").Visible = False
' Remove existing items from New Document Task Pane
For i = 0 To 9
sEntryURL = System.PrivateProfileString( _
FileName:=sLogFile, _
Section:="GoogleTaskPane", _
Key:="URLName" & CStr(i))
sEntryName = System.PrivateProfileString( _
FileName:=sLogFile, _
Section:="GoogleTaskPane", _
Key:="EntryName" & CStr(i))
If Len(sEntryURL) > 0 Then
Application.NewDocument.Remove _
FileName:=sEntryURL, _
Section:=msoBottomSection, _
DisplayName:=sEntryName, _
Action:=msoOpenFile
End If
Next i
' Move ends of selection to exclude spaces
' and paragraph marks
Selection.MoveStartWhile cset:=Chr$(32) & Chr$(19), _
Count:=Selection.Characters.Count
Selection.MoveEndWhile cset:=Chr$(32) & Chr$(19), _
Count:=-Selection.Characters.Count
' Get selection text for search
sSearchQuery = Selection.Text
If Len(sSearchQuery) = 0 Then Exit Sub
' Get search results
vSearchResults = google_search.wsm_doGoogleSearch( _
str_key:=sGoogleAPIKey, _
str_q:=sSearchQuery, _
lng_start:=lStart, _
lng_maxResults:=lMaxResults, _
bln_filter:=bFilter, _
str_restrict:=sRestrict, _
bln_safeSearch:=bSafeSearch, _
str_lr:=sLanguageRestrict, _
str_ie:=sInputEncoding, _
str_oe:=sOutputEncoding).resultElements
' Check for no results
On Error Resume Next
v = UBound(vSearchResults)
If Err.Number = 9 Then
MsgBox "No results found"
Exit Sub
ElseIf Err.Number <> 0 Then
MsgBox "An error has occurred: " & _
Err.Number & vbCr & _
Err.Description
Exit Sub
End If
' Add each result to the task pane
' and to the log file
i = 0
For Each v In vSearchResults
sSearchURL = v.URL
sSearchDisplayTitle = StripHTML(v.title)
Application.NewDocument.Add _
FileName:=sSearchURL, _
Section:=msoBottomSection, _
DisplayName:=sSearchDisplayTitle, _
Action:=msoOpenFile
System.PrivateProfileString( _
FileName:=sLogFile, _
Section:="GoogleTaskPane", _
Key:="URLName" & CStr(i)) = sSearchURL
System.PrivateProfileString( _
FileName:=sLogFile, _
Section:="GoogleTaskPane", _
Key:="EntryName" & CStr(i)) = sSearchDisplayTitle
i = i + 1
Next v
' Show the New Document Task Pane
CommandBars("Menu Bar").Controls("File").Controls("New...").Execut
End Sub
To help ens ure a good G oogle s earc h, the following two lines
c ollaps e two ends of the s elec tion if they c ontain s pac es or a
paragraph mark:
Selection.MoveStartWhile cset:=Chr$(32) & Chr$(19), _
Count:=Selection.Characters.Count
Selection.MoveEndWhile cset:=Chr$(32) & Chr$(19), _
Count:=-Selection.Characters.Count
Andrew Savikas
Hack 64. Google by Email
A ccess 10 of Google's search results at a time via email.
L ong before the Web exis ted, there was email. A nd now, thanks
to the G oogle A P I , there's G oogle email. C reated by the team at
C ape C lear (http://c apes c ienc e.c apec lear.c om/google),
C apeM ail queries G oogle via email. Send email to
google@ c apec lear.c om with the query in the s ubjec t line. You'll
rec eive a mes s age bac k with the es timated res ults c ount and
the firs t 1 0 res ults .
H ere's an exc erpt from a s earc h for Frankenstein:
Estimated Total Results Number = 591000
URL = "http://www.literature.org/authors/shelley-mary/frankenst
Title = "Online Literature Library - Mary Shelley - Frankenstein
Snippet = "Next Back Contents Home Authors Contact, Frankenstein
Mary Shelley. Preface; Chapter 1; Chapter 2; Chapter 3; Chapter 4
Chapter 5; Chapter 6; Chapter 7; Chapter ... "
URL = "http://www.nlm.nih.gov/hmd/frankenstein/frankhome.html"
Title = "Frankenstein Exhibit Home Page"
Snippet = "Table of Contents Introduction The Birth of Frankenst
The Celluloid Monster. Promise and Peril, Frankenstein: The Modern
Prometheus. ... "
URL = "http://www.sangfroid.com/frank/"
Title = "Frankenstein, or The Modern Prometheus"
Snippet = "1818 (this edition 1831) Frankenstein is the world-fa
story of a doctor whose brilliant mind gets the better of him. "
URL = "http://www.imdb.com/Title?0021884"
Title = "Frankenstein (1931)"
Snippet = "Frankenstein (1931) - Cast, Crew, Reviews, Plot Summa
Comments, Discussion, Taglines, Trailers, Posters, Photos, Showtim
Link to Official Site, Fan Sites. ... "
L ike many other G oogle A P I applic ations , you c an us e C apeM ail
only 1 ,0 0 0 times per day, s inc e the G oogle A P I allows the us e
of the key only that many times . D on't rely on this to the
exc lus ion of other ways to ac c es s G oogle. But if you're in a
s ituation where web s earc hing is not quite as eas y as s ending an
email mes s agey ou're on the go with a mobile phone or P D A , for
examplet his is a quic k and eas y way to interfac e with G oogle.
5.7.1. Hacking the Hack
C apeM ail c omes in handy with the c ombination of an email
applic ation and a way to automate s ending mes s ages (U nix's
cron, for example). Say you're res earc hing a partic ular topic a
relatively obs c ure topic but one that does generate web page
res ults . You c ould s et up your s c heduler (or even your email
program if able to s end timed mes s ages ) to fire off a mes s age to
C apeM ail onc e a day, gather, and arc hive the s earc h res ults .
Further, you c ould us e your email's filtering rules to divert the
C apeM ail mes s ages to their own folder for offline brows ing. M ake
s ure your s earc h is fairly narrow, though, bec aus e C apeM ail
returns only 1 0 res ults at a time.
G oogle A lerts [Hack #59] are more
apropos for this s ort of applic ation, but
C apeM ail was interes ting enough to be
worth mentioning.
Hack 65. Google by Instant Messenger
A ccessing Google with A OL Instant Messenger.
I f we're going to s tep out beyond the G oogle interfac e, why even
bother to us e the Web at all? T he G oogle A P I makes it pos s ible
to ac c es s G oogle's information in many different ways .
G ooglematic makes it pos s ible to query G oogle from the c omfort
of A O L I ns tant M es s enger.
H ere's how it works : s end a mes s age (a G oogle query) to the
ins tant mes s enger buddy googlematic. G ooglematic will mes s age
you bac k with the top res ult for your query. Reply with More and
you'll get more res ults formatted as a numbered lis t, as s hown in
Figure 5 - 2 4 .
Figure 5-24. Query to googlematic through AOL
Instant Messenger
M es s age with the number as s oc iated with a partic ular res ult for
further details , as s hown in Figure 5 - 2 5 .
Figure 5-25. Requesting further detail for a
googlematic result
You c an find the G ooglematic s c ript, further ins truc tions , and
links to required modules at
http://interc onnec ted.org/googlematic .
5.8.1. The Code
H ere's all there is to the c ode:
#!/usr/bin/perl -w
# googlematic.pl
# Provides an AIM interface to Google, using the Google SOAP API
# and POE to manage all the activity.
# Usage
# ./googlematic.pl &
# Requirements
# - Googlematic::IM, Googlematic::Responder, Googlematic::Search,
# which are all distributed with this script
# - CGI
# - HTML::Entities
# - Net::AOLIM
# - POE
# - SOAP::Lite
# - XML::Parser
# Essential configuration (below)
# - AIM username and password (used in Googlematic::IM)
# - Google API Developer Key (used in Googlematic::Search)
# Optional configuration (below)
# - Search request throttling (used in Googlematic::Search)
# - Limit of number of user sessions open (used in Googlematic::IM
# - Time limit on a user session (used in Googlematic::Responder)
# (c) 2002 Matt Webb <[email protected]> All rights reserved
use strict;
use POE;
$| = 1;
use Googlematic::IM;
use Googlematic::Search;
# Configuration variables
$Googlematic::CONFIG = {
aim_username => "xxxxxxx",
aim_password => "xxxxxxx",
google_key => "your key goes here",
searches_per_hour => "35", # the Google limit is 1000/day
max_user_sessions => "5",
user_session_timeout => "120" # in seconds
};
# There are two POE sessions:
# 1 - Googlematic::IM, known as 'im', takes care of the Instant Me
# connection and looks after user sessions (which are created
# POE sessions, and known as Responders).
POE::Session->create(
package_states => [
Googlematic::IM => [
'_start', 'login_aim', 'loop', 'spawner',
'handler_aim', 'send', '_child', '_stop', 'proxy'
);
# 2 - Googlematic::Search, known as 'google', takes care the SOAP:
# object making the searches on Google. Requests to it are sen
# individual Responders.
POE::Session->create(
package_states => [
Googlematic::Search => [
'_start', 'loop', 'search', 'reset'
);
# Run the POE machine.
$poe_kernel->run( );
exit;
Tara Calis hain and Matt Webb
Hack 66. Google from IRC
Perf orming Google searches f rom IRC is not only convenient, but
also ef f icient. See how f ast you can Google f or something on
IRC and click on the URL highlighted by your IRC client.
When s omeone pops into your I RC c hannel with a ques tion, you
c an bet your life that 9 times out of 1 0 , he c ould have eas ily
found the ans wer on G oogle. I f you think this is the c as e, you
c ould tell him that, or you c ould do it s lightly more s ubtly by
s ugges ting a G oogle s earc h term to an I RC bot, whic h will then
go and look for a res ult.
M os t I RC c lients are c apable of highlighting U RL s in c hannels .
C lic king on a highlighted U RL will open your default web brows er
and load the page. For s ome people, this is a lot quic ker than
finding the ic on to s tart your web brows er and then typing or
pas ting the U RL . M ore obvious ly, a s ingle G oogle s earc h will
pres ent its res ult to everybody in the c hannel.
T he goal is to have an I RC bot c alled G oogleBot that res ponds
to the !google c ommand. I t will res pond by s howing the title and
U RL of the firs t G oogle s earc h res ult. I f the s ize of the page is
known, that will als o be dis played.
5.9.1. The Code
Firs t, unles s you've already done s o, you will need to grab a
c opy of the G oogle Web A P I s D eveloper's Kit
(http://www.google.c om/apis /download.html) and c reate a
G oogle ac c ount and obtain a lic ens e key [C hapter 9 ]. A s I write
this , the free lic ens e key entitles you to 1 ,0 0 0 automated
queries per day. T his is more than enough for a s ingle I RC
c hannel.
T he googleapi.j ar file inc luded in the kit c ontains the c las s es that
the bot will us e to perform G oogle s earc hes , s o you will need to
make s ure this is in your c las s path when you c ompile and run
the bot (the s imples t way is to drop it into the s ame direc tory as
the bot's c ode its elf).
T he G oogleBot is built upon the P irc Bot J ava I RC A P I
(http://www.jibble.org/pirc bot.php), a framework for writing I RC
bots . You'll need to download a c opy of the P irc Bot ZI P file, unzip
it, and drop pircbot.j ar into the c urrent direc tory, along with the
googleapi.j ar.
For more on writing J ava- bas ed bots with
the P irc Bot J ava I RC A P I , be s ure to
c hec k out "I RC with J ava and P irc Bot"
[H ac k #3 5 ] in I RC Hacks (O 'Reilly) by P aul
M utton.
C reate a file c alled GoogleBot.j ava:
import org.jibble.pircbot.*;
import com.google.soap.search.*;
public class GoogleBot extends PircBot {
// Change this so it uses your license key!
private static final String googleKey = "000000000000000000000
public GoogleBot(String name) {
setName(name);
public void onMessage(String channel, String sender, String lo
String hostname, String message) {
message = message.toLowerCase( ).trim( );
if (message.startsWith("!google ")) {
String searchTerms = message.substring(8);
String result = null;
try {
GoogleSearch search = new GoogleSearch( );
search.setKey(googleKey);
search.setQueryString(searchTerms);
search.setMaxResults(1);
GoogleSearchResult searchResult = search.doSearch(
GoogleSearchResultElement[] elements =
searchResult.getResultElements( );
if (elements.length == 1) {
GoogleSearchResultElement element = elements[0
// Remove all HTML tags from the title.
String title = element.getTitle( ).replaceAll(
result = element.getURL( ) + " (" + title + ")
if (!element.getCachedSize( ).equals("0")) {
result = result + " - " + element.getCache
}
}
catch (GoogleSearchFault e) {
// Something went wrong. Say why.
result = "Unable to perform your search: " + e;
if (result == null) {
// No results were found for the search terms.
result = "I could not find anything on Google.";
// Send the result to the channel.
sendMessage(channel, sender + ": " + result);
}
}
Your lic ens e key will be a s imple s tring, s o you c an s tore that in
the G oogleBot c las s as googleKey.
You now need to tell the bot whic h c hannels to join. I f you want,
you c an tell the bot to join more than one c hannel, but remember,
you are limited in the number of G oogle s earc hes that you c an
do per day.
C reate the file GoogleBotMain.j ava:
public class GoogleBotMain {
public static void main(String[] args) throws Exception {
GoogleBot bot = new GoogleBot("GoogleBot");
bot.setVerbose(true);
bot.connect("irc.freenode.net");
bot.joinChannel("#irchacks");
}
5.9.2. Running the Hack
When you c ompile the bot, remember to inc lude both pircbot.j ar
and googleapi.j ar in the c las s path:
C:\java\GoogleBot> javac -classpath .;pircbot.jar;googleapi.jar *.
You c an then run the bot like s o:
C:\java\GoogleBot> java -classpath .;pircbot.jar;googleapi.jar Goo
T he bot will then s tart up and c onnec t to the I RC s erver.
5.9.3. The Results
Figure 5 - 2 6 s hows G oogleBot running in an I RC c hannel and
res ponding with the U RL , title, and s ize of eac h of the res ults of a
G oogle s earc h.
Figure 5-26. The GoogleBot performing an IRC-
related search
P erforming a G oogle s earc h is a popular tas k for bots to do. Take
this into ac c ount if you run your bot in a bus y c hannel, bec aus e
there might already be a bot there that lets us ers s earc h G oogle.
Paul Mutton
Hack 67. Google on the Go
Being on the go and away f rom your laptop or desktop doesn't
mean leaving Google behind.
A s the s aying goes , "You c an't take it with you." U nles s , that is ,
you're talking about G oogle. J us t bec aus e you've left your laptop
at home or at the offic e, that does n't nec es s arily mean leaving
the Web and G oogle behind. So long as you have your trus ty c ell
phone or network- enabled P D A in your poc ket, s o too do you
have G oogle.
Whether you have the top- of- the- line Treo 6 0 0 , Blac kberry, or
Sidekic k with integrated web brows er; bas e- model c ell phone
that your c arrier gave you for free; or anything in between,
c hanc es are that you're able to G oogle on the go.
G oogle c aters to the "on the go" c rowd with its G oogle wireles s
interfac es : a s impler, lighter, gentler P D A - and s martphone-
friendly vers ion of G oogle, a WA P (read: wireles s Web) flavor for
c ell phones with limited web ac c es s , and an SM S gateway for
mes s aging your query to and rec eiving an almos t ins tantaneous
res pons e from G oogle. T here's even a mobile interfac e to
G oogle's Froogle (http://froogle.google.c om) produc t s earc h.
5.10.1. Google by PDA or Smartphone
G oogle P D A Searc h (http://www.google.c om/palm) brings all the
power of G oogle to the P D A in your palm, hiptop on your belt, or
c ell phone in your poc ket.
D on't be fooled by the palm in the G oogle
P D A Searc h U RL , whic h is an artifac t of
P alm's majority minds hare at the time.
T he interfac e will work with your P oc ket
P C , Zaurus , Treo, or any other mobile
devic e that benefits from lighter web
pages .
Settle that "in like Flynn" vers us "in like Flint" dinner- table
argument without leaving your s eat. Find quic kie reviews and
c ommentary on that D us tmeis ter 2 0 0 0 vac uum before making
the purc has e. Figure out where you've s een that bit- part ac tor
before without having to wait for the c redits .
Your modern P D A and the s marter s o- c alled s martphones s port a
full- fledged web brows er on whic h you c an s urf all the Web has to
offer in living c olora lbeit s ubs tantially s maller. You find the us ual
A ddres s Bar, Bac k and Forward buttons , Bookmarks or Favorites ,
and point- and- c lic k (or point- and- tap, as the c as e may be)
hyperlinks . While the onboard brows er might jus t be able to
handle the regular G oogle.c om web pages , the G oogle P D A
Searc h provides s impler, s maller, no- nons ens e, plain H T M L
P ages . A nd res ults pages pac k in fewer res ults for fas ter
loading.
J us t point your mobile brows er at http://google.c om/palm, enter
your s earc h terms , c lic k the G oogle Searc h button, and up c ome
your res ults , as s hown in Figure 5 - 2 7 .
Figure 5-27. Google PDA Search (left) and
results (right) on a Nokia smartphone
You have the full range of G oogle s earc h s yntax [C hapter 1 ] and
c omplete web index available to you, although it might be more
than a little c hallenging to enter thos e quotes , c olons ,
parenthes es , and minus s igns .
5.10.2. Google by Cell Phone
I f you have a garden- variety c ell phoneo f the s ort your mobile
provider either gives away free with s ignup or c harges on the
order of $ 4 0 fory ou may yet find you have a built- in brows er...of a
s ort. D on't expec t anything nearly as fas t, c olorful, or feature-
filled as your c omputer's web brows er. T his is a text- only world,
limited in both dis play and interac tivity.
T hat all s aid, you have the wealthi f not the Tec hnic oloro f the Web
right in your very poc ket.
Step one, however, is to find the brows er in the firs t plac e. I t's
us ually hidden in plain s ight, c leverly hidden behind s ome
(pos s ibly meaningles s ) moniker s uc h as WA P, Web, I nternet,
Servic es , D ownloads , or a brand name s uc h as mM ode or T-
Zones . I f nothing of the s ort leaps out at you, look for an ic on
s porting your c ell phone provider's logo, take a s troll through the
menus , dig out your manual, or give your provider a ring (us ually
6 1 1 on your c ell phone).
Texting Sure Ain't QWERTY
Whether you're a 7 0 - word- per- minute touc h typis t or
you hunt and pec k your way through the Q WE RT Y
keyboard, you're initially going to find texting a pokey
c hore. Rather than the array of letters , numbers ,
s ymbols , and s hift keys on your c omputer keyboard,
everything you do on your c ell phone is c onfined to
twelve keys : 0 - 9 , * , and #. Frankly, it's an annoying
s ys tem to learn, but onc e you get us ed to it, it's not too
painful to us e; s ome folks ac tually bec ome rather adept
at it, rivaling their regular keyboarding s peeds .
L ook c los ely at your phone and you'll notic e eac h
button als o holds either a s et of three to four alphabetic
c harac ters or obs c ure s ymbols not unlike thos e you'd
expec t to find on the U FO landing in your bac k yard.
L ike your regular phone, the 1 button is devoid of letters
while 2 holds A BC , 3 D E F, and s o on to 9 WXY Z.
When you're in web- brows ing mode on your phone, you
c an tap the 2 button onc e to type an A , twic e in quic k
s uc c es s ion for a B, and thric e for a C . Four times nets
you a 2 . Keep going and you'll make it bac k through A ,
B, C , and 2 againo n s ome phones enc ountering s trange
and wonderful foreign letters along the way. D o this for
eac h and every letter in the word you're trying to s pell
out, s pelling the word "google" like s o: 4 6 6 6
6 6 6 4 5 5 5 3 3 . N otic e the gap between the 6 6 6 and 6 6 6 ?
What you're after is two "o"s in a row, but typing
6 6 6 6 6 6 will get you either a s ingle "o" or an " "
s inc e your phone does n't know when you want to move
on to the next letter. To type two of the s ame c harac ters
one after another, either wait a s ec ond or s o after
tapping in the firs t "o" or jiggle your phone's joys tic k to
the right or down.
When it c omes to s pec ial c harac ters like the dot (.) and
s las h (/) c ommon in web addres s es , you'll turn to the 1
button. A period or dot is a s ingle tap. T he s las h is
us ually 1 5 . For thos e of you keeping s c ore at home,
that'll leave you with 9 2 7 1 4 6 6 6 6 6 6 4 5 5 5 3 3 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 9 6 5 5 5 for wap.google.c om/wml.
T he texting equivalent of the s pac ebar is the 0 button.
What of digits ? Surely you don't need to type 1 7 or s o
1 s s c rolling through all the s ymbols as s oc iated with the
1 button ([.,- ? ! '@ :;/( )])jus t to get bac k around to the 1
you wanted in the firs t plac e. T hankfully, all it takes is
holding down the button for a s ec ond or s o to jump right
to the numeral. So ins tead of tapping through WXY Z to
get to 9 , hold down that 9 key for a moment or s o and
you're there.
T here are more effic ient input tec hniques , s uc h as T9
("Text on 9 keys ") and other predic tive text s ys tems ,
but they're not as us eful for entering pos s ibly obs c ure
words like thos e in web addres s es and G oogle
s earc hes .
Brows er in hand, point it at wap.google.c om/wml, tap in a s earc h
(without tripping over your fingers ), and c lic k the Searc h button
or link (as s hown in Figure 5 - 2 8 , left). A few moments later, your
firs t s et of res ults s how up (as s hown in Figure 5 - 2 8 , c enter).
Sc roll to the bottom of the res ults and c lic k the N ext link (s hown
in Figure 5 - 2 8 , right) to move on to the next page of res ults .
Figure 5-28. Google wireless search (left), first
results (center), and link to the next 5 results
(right)
C lic k any of the res ults to vis it the page in ques tion, jus t as you
would in a normal brows er. You'll notic e immediately that the
pages you vis it by c lic king a res ult link are dumbed downs imilar
to G oogle's wireles s s earc h its elft o s uit the needs of your
mobile's dis play abilities .
Truth be told, you're not vis iting the res ulting page direc tly at
all. What you s ee on your s c reen and in Figure 5 - 2 9 is c ourtes y
of the G oogle WA P proxy, a s ervic e turning H T M L pages into
WA P /WM L (think of it as H T M L for wireles s devic es ) on the fly.
C lic k another link on the res ulting page and you'll c ontinue
brows ing via the G oogle proxy, G oogle es s entially turning all the
Web into a mobile Web.
Figure 5-29. An ordinary web page as seen
through the lens of the Google WAP proxy
I n fac t, you c an ac tually s urf rather than s earc h the Web us ing
the G oogle WA P proxy. Find your mobile brows er's O ptions menu
and c lic k the G o to U RL link. I n the res ulting page, enter any
web s ite U RL into the G o to U RL box and c lic k the G o button to
vis it a mobile vers ion of that page.
T he O ptions menu is c hoc k- full of
additional options provided by the G oogle
WA P s ite: s earc h the full Web, the mobile
Web (s ites G oogle has found to be
optimized for mobile devic es ), language
["L anguage Tools " in C hapter 1 ], and H elp
doc umentation.
5.10.3. Google by SMS
A s a New York Times artic le ("A ll T humbs , Without the Stigma";
http://tec h2 .nytimes .c om/mem/tec hnology/tec hreview.html?
res =9 E 0 0 E 6 D E 1 6 3 FF9 3 1 A 2 5 7 5 BC 0 A 9 6 2 9 C 8 B6 3 ) s ugges ted
rec ently, the thumb is the power digit. While the thumboard of
c hoic e for exec utives tends to be the Blac kberry mobile email
devic e (http://www.blac kberry.c om/), for the res t of the world
(and many of the kids in your neighborhood), it's the c ell phone
and Short M es s age Servic e (SM S).
SM S mes s ages are quic k- and- dirty text mes s ages (think mobile
ins tant mes s aging) tapped into a c ell phone and s ent over the
airwaves to another c ell phone for around 5 to 1 0 c ents apiec e.
But SM S is n't jus t for pers on- to- pers on mes s aging. I n the U K,
BBC Radio provides s o- c alled s hortcodes (really jus t s hort
telephone numbers ) to whic h you c an SM S your reques ts to the
D J 's automated reques t- trac king s ys tem. You c an SM S bus and
rail s ys tems for travel s c hedules . Your airline will SM S you
updates on the s tatus of your flight. A nd now you c an talk to
G oogle via SM S as well.
G oogle SM S (http://www.google.c om/s ms /) provides an SM S
gateway for querying the G oogle Web index, looking up phone
numbers [Hack #6], s eeking out definitions [Hack #10], and
c omparative s hopping in the Froogle produc t c atalog s ervic e
(http://froogle.google.c om).
Simply s end an SM S mes s age to U .S. s hortc ode 46645 (read:
GOOGL), as s hown later in Figure 5 - 3 0 , with one of the following
forms of query:
Web Search
Searc h the G oogle Web by prefixing your query with a G
(upper- or lowerc as e). You'll rec eive the top two res ults
in return, formatted as text s nippets , hopefully
c ontaining s ome information of us e to you.
g capital of south africa
G answer to life the universe and everything
Google Local Bus ines s Lis ting
C ons ult G oogle L oc al's bus ines s lis tings by pas s ing it a
bus ines s name or type and c ity, s tate c ombination, or
Zip C ode.
vegetarian restaurant Jackson MS
southern cooking 95472
scooters.New York NY
T he G oogle SM S doc umentation
s ugges ts us ing a period (.) between
your query and c ity name or zip
c ode to be s ure that you're
triggering a G oogle L oc al Searc h.
Res idential Phone Number
Find a res idential phone number with s ome c ombination
of firs t or las t name, c ity, s tate, Zip C ode, or area c ode.
augustus gloop Chicago il
violet beauregard 95472
mike teevee ny
A s with any G oogle P honebook
[Hack #6] query, you'll find only
lis ted numbers in your res ults .
Froogle Prices
C hec k the c urrent pric es of items for s ale online through
Froogle (http://froogle.google.c om/). To trigger a Froogle
lookup, prefix your query with an F (upper- or lowerc as e),
price, or prices (the latter two will als o work at the end of
the query).
g nokia 6230 cellphone
price bmw 2002
ugg boots prices
Definition
Rather than s c ratc hing your head trying to unders tand
jus t what M s . A us ten means by dis approbation, as k
G oogle for a definition [Hack #10] . P refix the word or
phras e of interes t with a D (upper- or lowerc as e) or the
word define.
D disapprobation
define osteichthyes
Calculation
P erform feats of c alc ulation and c onvers ion us ing the
G oogle C alc ulator [Hack #47] .
(2*2)+3
12 ounces in grams
Zip Code
P as s G oogle SM S a U .S. Zip C ode to find out where one
might find it in the c ountry.
95472
G oogle SM S is s ure to s port more features
by the time you read this . Be s ure to
c ons ult the "G oogle SM S: H ow to U s e"
page at
http://www.google.c om/s ms /howtous e.html
for the lates t orfor the real thumb joc keys
among yous ubs c ribe your email addres s to
an announc ement lis t from the G oogle
SM S home page.
You'll rec eive your res ults as one or more SM S mes s ages
labeled, appropriately enough, (1 of3 , 2 of3 , etc .), as s hown in
Figure 5 - 3 0 . N otic e that there are no U RL s or links in the
res pons es : what's the point when you c an't c lic k on them?
Figure 5-30. A Google SMS query (left) and
response (right)
While the c os t of s ending an SM S
mes s ages (typic ally between 5 and 1 0
c ents apiec e) is us ually borne by the
s ender, automated mes s ages like thos e
s ent by G oogle SM S are us ually c harged
to you, the rec eiver. U nles s you have an
unlimited SM S plan, all that googling c an
add up. Be s ure to c hec k out what's
inc luded in your mobile plan, c hec k your
phone bill, or c all your mobile operator
before you s pend a lot of time (and money)
on this s ervic e.
5.10.4. Froogle on the Go
I f you wis h you c ould c ompare pric es at that "O ne D ay Sale" on
kitc hen gadgets without leaving the s tore, Wireles s Froogle
(http://froogle.google.c om) is as muc h a part of the s hopping
experienc e as that c redit c ard.
P oint your mobile brows er at wml.froogle.c om, tap in the name of
the produc t you're about to take to the c hec kout (Figure 5 - 3 1 ,
left), and up pops a lis t of pric es as advertis ed by online vendors
(Figure 5 - 3 1 , right).
Figure 5-31. Wireless Froogle Search (left) and
results (right)
You'll find everything from c ellular phones to yogurt makers ,
abac us es to faux yak fur c oats on Froogle.
A t the time of this writing, Wireles s Froogle is nowhere near as
c omplete as one might hope. You c an't c ons train your res ults by
pric e, group them by s tore, or s ort them in any way. Res ults
don't link to anywhere. T hat s aid, it is a s till a handy pric e- c hec k
tool as you're s tanding in that c hec kout line.
$ 4 9 .9 9 for a pas hminal emme at it! Sometimes ins tant
gratific ation is worth it; s ometimes paying only $ 4 9 .9 9 for s ilk
is well worth the wait.
Hack 68. Visit the Google Labs
Google Labs, as the name suggests, sports Google's
experiments, f un little hacks, and inspirational uses of the
Google engine and database.
Be s ure not to mis s G oogle L abs (http://labs .google.c om). T he
whole point of this part of G oogle's s ite is that things will appear,
vanis h, c hange, and bas ic ally do whatever they want. So, while
the s ite might be different by the time you read this , it's s till
worth c overing, bec aus e you might find one of the tools here
us eful in s parking ideas .
A t the time of this writing, there are a number of experiments
running at the lab, s ome of whic h are c overed in depth els ewhere
in this book:
Google Des ktop Search (http://des ktop.google.com)
C overed in [Hack #61] .
Google SMS (http://s ms .google.com)
C overed in [Hack #67] .
Site-Flavored Google Search Box
(http://www.google.com/s ervices /s iteflavored.html)
Tailor a G oogle s earc h box to return res ults of a
partic ular s lant (e.g., kids ' c ontent, c omputers : hac king,
etc ).
Google Groups 2 (http://groups -beta.google.com)
C overed in [Hack #56] .
Google Pers onalized Web Search
(http://labs .google.com/pers onalized)
Tailor your G oogle res ults to s uit your
interes ts e s s entially an individualized vers ion of Site-
Flavored G oogle Searc h.
Froogle Wireles s (http://labs .google.com/frooglewml.html)
C overed in [Hack #67] .
Google Des kbar (http://toolbar.google.com/des kbar)
C overed in [Hack #60] .
Google Compute (http://toolbar.google.com/dc/offerdc.html)
A n add- in to the G oogle Toolbar [Hack #60], G oogle
C ompute borrows a few c yc les from your c omputer as it
s its idle and applies this c omputing energy to s olve
diffic ult s c ientific problems around the world.
Google Sets (http://labs 1.google.com/s ets )
E nter a few terms , and G oogle will try to c ome up with an
appropriate s et of phras es . For example, enter A mazon
and Borders , and G oogle will c ome up with Borders ,
A mazon, Barnes & N oble, Buy.c om, M edia P lay,
SunC oas t, Samgoody, etc . I t does n't always work like
you'd expec t. E nter vegan and vegetarian and you'll get
veal, Valentine's D ay, Tas mania; it goes a bit far afield.
C lic king any item in the group lis t will launc h a regular
G oogle s earc h.
Google WebQuotes (http://labs .google.com/cgi-bin/webquotes /)
M any times , you c an learn the mos t about a web page by
what other web pages s ay about it. G oogle WebQ uotes
takes advantage of this fac t by providing a preview of
what other s ites are s aying about a partic ular link before
you ac tually meander over to the s ite its elf.
From the G oogle WebQ uotes home page, s pec ify how
many WebQ uotes you'd like for a partic ular s earc h (the
default is three, a number that I find works well) and
enter a s earc h term. G oogle WebQ uotes returns the top
1 0 s ites (or, if you s uffix the res ultant U RL with &num=100,
the top 1 0 0 s ites ) with as many WebQ uotes for eac h
page as you s pec ified. N ote, however, that not every
page has a WebQ uote.
T his c omes in rather handy when you're doing s ome
general res earc h and want to know immediately whether
the s earc h res ult is relevant. When you're s earc hing for
famous people, you c an get s ome us eful information on
them this way, tooa nd all without leaving the s earc h
res ults page!
Hack 69. Find Out What Google Thinks
___ Is
What does Google think of you, your f riends, your neighborhood,
or your f avorite movie?
I f you've ever wondered what people think of your home town,
your favorite band, your favorite s nac k food, or even you,
G ooglis m (http://www.googlis m.c om) may provide you with
s omething us eful.
5.12.1. The Interface
T he interfac e is dirt s imple. E nter your query and c hec k the
appropriate radio button to s pec ify whether you're looking for a
who, a what, a where, or a when. Figure 5 - 3 2 s hows a
repres entative res ults page for C live Sinc lair, inventor of the
Sinc lair ZX- 8 0 pers onal c omputer
(http://www.nvg.ntnu.no/s inc lair/c omputers /zx8 0 /zx8 0 .htm).
You c an als o us e the tabs to s ee what other objec ts people are
s earc hing for and what s earc hes are the mos t popular. A word of
warning: s ome of thes e are not s afe for work.
Figure 5-32. Googlism results for Clive Sinclair
5.12.2. What You Get Back
G ooglis m will res pond with a lis t of things G oogle believes about
the query at hand, be it a pers on, plac e, thing, or moment in
time. For example, a s earc h for Perl and "What" returns , along
with a laundry lis t of others :
Perl is a fairly straightforward
Perl is aesthetically pleasing
Perl is just plain fun
T hes e are among the more humorous res ults for Steve Jobs and
"Who":
steve jobs is my new idol
steve jobs is at it again
steve jobs is apple's focus group
To figure out what page any partic ular s tatement c omes from,
s imply c opy and pas te it into a plain old G oogle s earc h. T hat
las t s tatement, for ins tanc e, c ame from an artic le titled
"I nnovation: H ow A pple does it" at
http://www.gulker.c om/ra/appleinnovation.html.
5.12.3. Practical Uses
For the mos t part, this is a party hac ka good party hac k. I t's a
fun way to aggregate related s tatements into a s illy (and
oc c as ionally profound) lis t.
But that's jus t for the mos t part. G ooglis m als o works as a handy
ready- referenc e applic ation, allowing you to quic kly find ans wers
to s imple or s imply as ked ques tions . J us t as k them of G ooglis m
in a way that c an end with the word is . For example, to dis c over
the c apital of V irginia, enter The capital of Virginia. To learn why
the s ky is blue, try The reason the sky is blue. Sometimes this
does n't work very well; try the oldest person in the world and
you'll immediately be c onfronted with a variety of c ontradic tory
information. You'd have to vis it eac h page repres ented by a
res ult and s ee whic h ans wer, if any, bes t s uits your res earc h
needs .
5.12.4. Expanding the Application
T his applic ation is a lot of fun, but it c ould be expanded. T he
tric k is to determine how web page c reators generate
s tatements .
For example, when initially des c ribing an ac ronym, many writers
us e the words " stands for". So you c ould add a G ooglis m that
s earc hes for your keyword and the phras e "s tands for." D o a
G oogle s earc h for " SETI stands for" and " DDR stands for" and you'll
s ee what I mean.
When referring to animals , plants , and even s tones , the phras e
"are found" is often us ed, s o you c ould add a G ooglis m that
loc ated things . D o a G oogle s earc h for sapphires are found and
jaguars are found and s ee what you find.
See if you c an think of any phras es that are in c ommon us age,
and then c hec k thos e phras es in G oogle too s ee how many
res ults eac h phras e has . You might get s ome ideas for a topic -
s pec ific G ooglis m tool yours elf.
Hack 70. The Search Engine Belt Buckle
Take the Web out f or a night on the town.
I t was a late A ugus t Saturday night in Seattle. We dec ided not
only to hit the danc e floor, but to boogie down in a whole new way.
A ll the c ats in town were wearing big belt buc kles then, s o we
thought, hey, here's our c hanc e to s trut our lates t hac k: the
Searc h E ngine Belt Buc kle.
5.13.1. What in Blazes Is a Search Engine
Belt Buckle?
T he Searc h E ngine Belt Buc kle is a repurpos ed P D A , s hown in
Figure 5 - 3 3 , that dis plays a s c rolling lis t of 2 4 hours ' worth of
all the bizarre and banal things that people are looking for on the
Webright there jus t above or below your navel, depending on
loc al c us tom or pers onal preferenc e.
Figure 5-33. The author, sporting the Search
Engine Belt Buckle
J us t to give you s ome idea of the s ort of thing that you're in for,
here is a s mattering of queries s c rolling ac ros s my belt buc kle's
s c reen at the time of this writing:
"olympic nude athletes "
"leaving the s c ene of an ac c ident"
"night diaper bondage"
"food"
"us ed juic er"
"homeopathic s inus remedies "
T he Searc h E ngine Belt Buc kle has enough battery power to las t
for about two to three hours , plenty of time for gettin' down and
attrac ting (or warding off) the ladies (or the gents ), as the c as e
may be. I f there's WiFi in the area, it'll s tream live queries , but
s inc e that's always an unknown, we have a few hours of s earc h
queries on hand at all times .
5.13.2. Step 1: The Video
A s our s ourc e, we us ed Searc hSpy
(http://www.dogpile.c om/info.dogpl/s earc hs py), a groovy
s c rolling lis t of s earc h terms s ubmitted to the D ogpile
(http://www.dogpile.c om) meta- s earc h engine. We c aptured a
good 2 4 hours ' worth to keep in the c ac he.
C apture Searc hSpy res ults by pointing your brows er at either
http://www.dogpile.c om/info.dogpl/s earc hs py/res ults .htm?
filter=1 for "family- friendly real- time s earc hes " or
http://www.dogpile.c om/info.dogpl/s earc hs py/res ults .htm?
filter=0 for "unedited real- time Web s earc hes . C ons ider yours elf
warned."
5.13.2.1 Shoot the footage
G rab and ins tall a c opy of Windows M edia E nc oder 9
(http://www.mic ros oft.c om/windows /windows media/9 s eries /enc ode
free). I n the N ew Ses s ion wizard, s hown in Figure 5 - 3 4 , c lic k
"C apture s c reen" and the O K button.
Figure 5-34. Windows Media Encoder's Wizard
walks you through screen capture setup
Yes , we c ould have grabbed the XM L from
the Flas h SWF file (bear in mind that the
XM L would need to be updated every s o
often for the lates t res ults , s omething
that's not pos s ible if you don't have an
I nternet c onnec tion in the dis c o) and built
a c us tom app to do all the dis play work,
but we wanted to make this approac hable
for the typic al readera n odd definition of
typical, I grant you, given that we're
making a s earc h engine belt buc kle. By
us ing a garden- variety video file, we c an
dis play it on a broader s pec trum of
s ys tems and mediums .
I n the next menu c hoos e "Region of the s c reen" and c lic k the
N ext button.
C lic k the "U s e s elec tion button" and drag an outline around the
s c rolling s earc h res ults , indic ated by the rec tangle in Figure 5 -
35.
Figure 5-35. Drag an outline around the scrolling
search results
P ic k a name for the file and c lic k N ext. C hoos e an enc oding
method; we pic ked M edium (Figure 5 - 3 6 ) bec aus e all we're
dis playing here is text on a belt buc kle. C lic k the N ext button to
c ontinue.
Figure 5-36. Medium encoding is good enough
for scrolling text
G ive the video a title, author, and s o forth. C lic k the Finis h
button when you're done.
Windows M edia E nc oder is now rec ording all thos e s c rolling
s earc hes in real time. C apture a good 2 4 hours ' worth of
s c rolling s earc h terms to keep in the c ac he, as it were. When
you think you've got enough, c lic k the E nc oder's applic ation
ic on on your Windows tas kbar and s top the rec ording.
To give you s ome idea of the s ort of thing you s hould expec t to
s ee s c rolling ac ros s your belt buc kle, take a gander at the 1 0 -
minute s ample at
http://www.engadget.c om/c ommon/videos /pt/s earc h.wmv
(Windows M edia).
5.13.2.2 Encode for Pocket PC
Before you throw this video at your P oc ket P C , you'll want to
rec ode it to play full s c reen in the P oc ket P C vers ion of Windows
M edia P layer.
C los e the Windows M edia E nc oder and s tart it bac k up again. I n
the s tarting Wizard, c lic k "C onvert a file," c lic k Brows e, and
c hoos e the video file you jus t rec orded. C lic k the N ext button
and c hoos e P oc ket P C .
I n the E nc oding O ptions window, s elec t "P oc ket P C wides c reen
video (C BR)" from the V ideo pull- down menu (s ee Figure 5 - 3 7 );
this will enc ode the video as 3 2 0 2 4 0 . C lic k Finis h and go get a
c up of c offee as it c hurns through the job of re- enc oding.
Figure 5-37. Re-encode the video to fill the
Pocket PC's screen
When the re- enc oding is done, move that file over to the P oc ket
P C . While you c an do s o over the U SB c able with A c tive Sync or
even s end it through the ether over Bluetooth, the s imples t
method is to put it onto an SD c ard (us ing a c ard reader plugged
into your c omputer) and pop the c ard into your P oc ket P C .
5.13.2.3 Get the settings just right
Find the file in your P oc ket P C 's File E xplorer, c lic k it to s tart it
playing (as s hown in Figure 5 - 3 8 , and c lic k the Stop button.
Figure 5-38. Footage playing in the Pocket PC
version of Windows Media Player
Tap Tools Settings A udio & V ideo. From the "While
us ing another program" pull- down menu, c hoos e "C ontinue
playbac k" and s elec t "A lways " from the "P lay video in full
s c reen" menu. T hes e s ettings are s hown in Figure 5 - 3 9 . Tap the
O K c irc le at the top- right.
Figure 5-39. Setting Audio and Video play
options
Tap Tools Repeat to have the video play again and again,
uninterrupted.
N ow we need to turn off the P ower M anagement
nons ens en ec es s ary for day- to- day P oc ket P C us age, but not
optimal for making s ure our belt buc kle is always on. Tap the
Start M enu Settings Sys tem P ower A dvanc ed
tab. U nc hec k the "Turn off devic e is not us ed for" c hec kbox, as
s hown in Figure 5 - 4 0 .
Figure 5-40. Keep your belt buckle groovin'
Tap the "A djus t bac klight s ettings to c ons erve power" link and
unc hec k the "Turn off bac klight if devic e is not us ed for"
c hec kbox (Figure 5 - 4 1 ). T his will keep the devic e on (and you
looking groovy) until you pres s the power button or run out of
juic et he P oc ket P C or your danc in' feet, whic hever c omes firs t.
Figure 5-41. Keep your belt buckle glowin'
Tap the O K c irc le at top- right to finis h.
Tap the Start M enu, followed by the Windows M edia P layer ic on.
Tap P lay and your s earc h video will play as long as you want it
to, rotated to the right for optimal belt buc kle viewing (Figure 5 -
4 2 ).
Figure 5-42. Playback is oriented to the right for
optimal viewing
I f you want, you c an als o adjus t the brightnes s (the Brightnes s
tab in the Bac klight s ettings window), depending on the vibe.
A ll of this works jus t as well with any other
hip video you might like to s trap on to your
midriff. A nd you c an always edit the video,
alter c olor, add effec ts and trans itions , and
s o forth. H ow about a c ollage of digital
photos from the G oogle I mages gallery
[Hack #51] ?
5.13.3. Making the Belt Buckle
N ow, of c ours e, you c an s top here. You don't really need to make
this into a belt buc kle. I t's rather mes merizing in and of its elf
and is an entertaining addition to your des k at the offic e
(as s uming you c hos e Filtered mode, that is ).
T hat s aid, we jus t c ouldn't res is t the temptation to make a big,
bad belt buc kle. So we grabbed a few s upplies (belt, s hiny beads ,
blac k elec tric al tape, Velc ro, and a hot glue gun, s hown in Figure
5 - 4 3 ) from around the hous e, and we were off.
Figure 5-43. You'll find all you need lying about
the house or at your corner craft shop
Firs t, we wrapped the P oc ket P C in blac k elec tric al tape, leaving
only the s c reen and us eful buttons s howing, as s hown in Figure
5 -4 4 .
Figure 5-44. "Disappear" that Pocket PC with a
roll of black electrical tape
For s ome flas h, we hot- glued s hiny beads (from here on out
c alled s tuds ) to the tape around the edges in Figure 5 - 4 5 .
Figure 5-45. Glue on some flash
To attac h the buc kle to the belt, we s tuc k one s ide of a s trip of
adhes ive Velc ro to the bac k of the P oc ket P C , the other s ide to
the unadorned buc kle of a s imple blac k belt we had in our c los et,
as s hown in Figure 5 - 4 6 .
Figure 5-46. Velcro the buckle to the belt
P ut on your danc in' s hoes , dis c o s hirt, s ome natty s lac ks ,
Searc h E ngine Belt Buc kle (Figure 5 - 4 7 ), and enjoy a night out
on the town.
Figure 5-47. The finished product, a snazzy
Search Engine Belt Buckle
Phillip Torrone
Chapter 6. Gmail
H ac ks 7 1 - 8 0
Sec tion 6 .2 . G mail Searc h Syntax
Sec tion 6 .3 . A dditional Res ourc es
H ac k 7 1 . G lean a G mail I nvite
H ac k 7 2 . C reate and U s e C us tom A ddres s es
H ac k 7 3 . I mport Your C ontac ts into G mail
H ac k 7 4 . I mport M ail into G mail
H ac k 7 5 . E xport Your G mail
H ac k 7 6 . Take a Walk on the L ighter Side
H ac k 7 7 . G mail on the G o
H ac k 7 8 . U s e G mail as a L inux Files ys tem
H ac k 7 9 . U s e G mail as a Windows D rive
H ac k 8 0 . P rogram G mail
Hacks 71-80
G oogle's G mail web- bas ed email s ervic e (www.gmail.c om) is n't
your ordinary web mail s ervic e. M aybe you're attrac ted to the
s lic k, interac tive, real applic ation- like J avaSc ript- powered web
interfac e. O r the c ommand- line joc key in you likes the P ine- like
keyboard s hortc uts (P ine is a text- only email applic ation,
typic ally found on U nix s ys tems ). O r is it the s heer volume of
s torageo ne gigabyte, at the time of this writingt hat's made you
ques tion your relations hip with your exis ting web mail s ervic e
and its puny 5 0 megabyte allotment. M os t are entic ed by the
promis e (and delivery, mind you) of a G oogle- like s earc h
interfac e to their email.
T here was a day when a s imple off- by- one
(tec hnic ally, an off- by- 9 9 9 error) c aus ed
quite a s tir among early G mail us ers .
L ogging into your G mail ac c ount, you were
met with the double take- worthy: "You are
c urrently us ing 1 6 M B (0 % ) of your
1 0 0 0 0 0 0 M B." I 'll s ee your gigabyte and
rais e you a terabyte.
Whatever your reas ons for trying, s witc hing to, or lus ting after a
G mail ac c ount, you're s ure to be delighted both by its proper and
"improper" us es t he latter being the foc us of this c hapter.
A s with all things G oogle, the offic ial interfac e to G mail is only
one of many. T hanks to s ome c lever s c reen s c raping, analys is of
the data model and format underlying the c andy- c oated
J avaSc ript frontend, and s ome good old tinkerer's enthus ias m,
you c an us e G mail as everything from a files ys tem [H ac k #7 8
and H ac k #7 9 ] to a bac kup s erver [Hack #80] to a mobile email
ac c ount for G mail on the go.
6.2. Gmail Search Syntax
G mail offers a ric h s earc h s yntax for routing through your email
mes s age arc hivea s if you'd expec t, or indeed s tand for, any les s .
from:
D igs through the headers of your email mes s age arc hive
in s earc h of mail s ent by s omeone matc hing the keyword
that you provide.
from:[email protected]
to:
T he yang to from:'s yin, to: finds all mes s ages s ent to
s omeone matc hing a provided keyword. (D on't forget
plus - addres s ing Hack #72.)
to:[email protected]
to:[email protected]
subject:
M atc hes mes s ages with a partic ular s ubjec t.
subject:"meeting notes"
label:
L ooks for mes s ages with a partic ular label applied.
label:knitting
has:attachment
T he has: s yntax has only one pos s ible value (at leas t at
the time of this writing): attachment. has:attachment in a
query returns only mes s ages having one or more
attac hments .
has:attachment
filename:
Finds mes s ages with an attac hment filename matc hing a
provided pattern. U s ed with jus t a file extens ion (e.g., pdf
or txt), filename: turns up all mes s ages with
attac hments of a partic ular type.
filename:meeting_notes.txt
filename:pdf
in:
Returns a lis t of mes s ages in a partic ular c ollec tion
(read: folder). A c c eptable values for in: are inbox, trash,
spam, and anywhere (tras h and s pam are not inc luded in
s earc hes unles s explic itly inc luded us ing in:trash,
in:spam, or in:anywhere). O ddly enough, sent is n't a
us able value for in:.
in:inbox
in:anywhere
is:
A c c eptable values for is: are starred, unread, and read,
whic h return s tarred, unread, and read mes s ages ,
res pec tively.
is:read
cc:
Finds mes s ages c arbon c opied to partic ular rec ipients .
cc:[email protected]
bcc:
Finds outgoing mes s ages blind c arbon c opied to
partic ular rec ipients . N ote that bcc: won't work on any
inc oming mail s inc e there's no way to tell who was on
the bc c line.
bcc:[email protected]
before:
M atc hes mes s ages s ent or rec eived before a partic ular
date, s pec ified in yyyy/mm/dd format. U nfortunately,
partial dates y ear only or year and monthd on't find
anything at all.
before:2004/10/02
after:
M atc h mes s ages s ent or rec eived on or after a partic ular
date, s pec ified in yyyy/mm/dd format.
after:2004/11/21
6.2.1. Phrase Searches
E nc los e phras es in double- quotes (") to have G mail s earc h treat
them as a unit to be matc hed exac tly (c as e is n't taken into
ac c ount). T he following query finds only ac c ounting department
reports :
Subject:"accounting department report"
6.2.2. Basic Boolean
T he only Boolean operator s upported by G mail s earc h is OR
(upperc as e is required). I n the abs enc e of the OR operator, AND is
implic it.
T he Boolean OR operator works in G mail s earc hes jus t as it does
in G oogle Web s earc hes : s pec ify that any one word or phras e is
ac c eptable by putting an OR between eac h, s uc h as this query,
whic h finds all mes s ages from the bos s or with their s ubjec ts
marked as urgent:
from:[email protected] OR subject:urgent
6.2.3. Negation
T he negation operator (-) als o works as it does in G oogle Web
Searc h, exc luding mes s ages matc hing the negated keyword or
operator:keyword pair. So, the following query turns up all
mes s ages to my E xample C o. not s ent from the c ompany's
s pec ial offers department:
to:@examplecom -from:offers@
6.2.4. Grouping
P arenthes es are us ed a little s trangely in G mail queries . When
enc los ing a s et of words , they s pec ify that all of thos e words
mus t be found to be c ons idered a matc h. So, the following
matc hes mes s ages s ent to both Sam and M ira:
to:(sam mira)
T hrowing in an OR allows optional matc hes while being explic it
about groups of options while we humans tend to be able to pars e
prec edenc e without need of parenthes es , s earc h engines need a
little more help. T he following query finds all mes s ages s ent to
Sam about roc kets or helic opters :
to:sam subject:(rockets OR helicopters)
6.2.5. Mixing Syntax
G mail's various s earc h operators tend to play well together.
While the tendenc y is to s tart out with minimal s earc h c riteria
and keep whittling down, with a large number of email mes s ages ,
c rafting your s earc hes c an s tart to take a lot of work. Take a
c hanc e and provide as muc h information as you know about the
mes s age you're after and bac k off bit by bit if you don't find it.
T he following query, for ins tanc e, is one that I jus t c ouldn't pull
off in my c omputer's email c lient:
from:Duncan before:2004/10/01 subject:today "World Cup" lunch
6.3. Additional Resources
A s with all H ac ks books , what you find here is jus t a tas te of
what's mos t likely available by the time this book ends up in
your hands . H ere are a few more res ourc es you might vis it:
T he G mail doc umentation
(http://gmail.google.c om/s upport) is c hoc k- full of tips ,
tric ks , keyboard s hortc uts , s earc h s yntax, and more.
G mail G ems (http://gmailgems .blogs pot.c om) "reveals
the tips and tric ks of G mail mas ters ."
J us tin Blanton's "G etting M ore O ut of G mail"
(http://jus tinblanton.c om/arc hives /2 0 0 4 /0 6 /2 0 /getting_m
provided muc h gris t and many pointers for this c hapter.
G mailForums (http://www.gmailforums .c om), as the
name s ugges ts , is a plac e to dis c us s all things G mail.
M ark Lyon, author of G oogle E mail L oader Hack #74 has
c ollec ted a good lis t of apps and hac ks
(http://www.marklyon.org/gmail/gmailapps .htm).
A nd, of c ours e, you c an always G oogle for gmail hacks
and gmail hacking.
Remember that all of thes e are hac ks and,
as s uc h, have no quality of s ervic e
guarantees ; if they break, they break.
A bout all you c an do is go bac k to the
hac k's home page and s ee if there's a new
vers ion available.
Hack 71. Glean a Gmail Invite
A sk a f riend, acquaintance, or stranger; swap, auction, or
f inagle. A Gmail invite is hard to come bybut not that hard.
G mail is one hot property and G mail ac c ounts are not available
to jus t anyone. Yes , it's a free web mail s ervic e, but free does n't
nec es s arily mean freely availablemuc h to the c hagrin of thos e
jus t itc hing to give it a whirl. You have to be invited, either by a
G oogler (s omeone working at G oogle) or a friend willing to s pend
one of their oc c as ionally available G mail invites on you.
H mm . . . s c arc e c ommodity, high demand . . . s ounds like a
market to me.
A nd that's prec is ely what's happened. G mail ac c ounts are
meted out to c los e friends , traded for wares and s ervic es ,
auc tioned off, donated, and otherwis e traffic ked in a marketplac e
of s orts .
So, where do I glean mys elf a G mail invite?
As k a friend
C hanc es are one of your alpha- geek friends has a G mail
ac c ount. A s k nic ely and be prepared to offer a latte or
three.
As k an acquaintance
E mail ac quaintanc es with G mail ac c ounts are eas y to
s pot: jus t look for the @ gmail.c om email addres s . Set up
a filter in your email applic ation to highlight any
inc oming G mail and rifle off a res pons e the moment you
s ee one pop up. Your ingenuity and bravado are s ure to
be admireda nd hopefully rewarded.
Reques t one of a s tranger
T he is noop.net G mail invite s pooler
(http://is noop.net/gmailomatic .php) offers "a plac e for
people with G mail invites and thos e who want them to
c ome together with minimal effort and fus s ."
eBay for one
Yes , I know it s eems s illy, but G mail invites are going for
between $ 0 .3 0 and $ 3 .0 0 on eBay.
Swap s omething
G mail s wap (http://www.gmails wap.c om) is a virtual
s wap meet for G mail invites where people offer
everything from C D s to kis s es for an invite. I f you've an
invite or three to trade, as k for a joke, pic ture, or
"anything D is ney" and bring your s ens e of humor.
Join the military
G mail for the Troops (http://www.gmailforthetroops .c om)
and G mail 4 Troops (http://www.gmail4 troops .c om) are
s ites dedic ated to garnering G mail ac c ounts for troops
c urrently s erving to keep in touc h with their loved ones
at home.
Google for it
Try s earc hing G oogle for " have * Gmail invites" (wow,
that full- word wildc ard really c omes in handy! ). O ften
webloggers who have G mail invites available will pos t
about it on their weblog. E ven if you've found an old
entry, you've found s omeone with a G mail ac c ounta nd
G oogle periodic ally refres hes the number of invites a
us er has available.
By the time you read this , G mail may well be freely available. I f
s o, think of this hac k as a moment in time when G mail was the
geek equivalent of a c ollec tor's plus h toy.
Rael Dornfes t and Jus tin Blanton
Hack 72. Create and Use Custom
Addresses
Make up an unlimited number of arbitrary email addresses to use
when signing up f or something, making a purchase online, or
tracking a conversation.
T hos e who've been expos ed to the power of a little s omething
c alled plus -addres s ing never look bac k, us ing it anywhere and
everywhere they c an. A nd, for s omething s o us eful, there's really
not muc h to it.
Simply append a plus s ign (+) and s ome meaningful s tring of
letters or numbers (meaningful to you, that is ) to the firs t part of
your email addres s t he part before the "at" s ign (@)a nd you have
a way of tagging a partic ular c onvers ation, an addres s us ed to
s ign up for a s ervic e or buy s omething online, or c reate a
throwaway addres s you have no intention of paying attention to
again.
Say your email addres s is raelity@ gmail.c om. A plus - addres s ed
vers ion might be raelity+s hopping@ gmail.c om. A nd you don't
have to s top there; you c an c reate s ubtags and s ub- s ubtags
s uc h as raelity+s hopping+amazon@ gmail.c om and
raelity+s hopping+amazon+books @gmail.com for even more
granularity.
A nd the magic of it is that all plus - addres s ed email s till arrives
at the s ame email addres s : yours , s ans the plus bit. A t that
point you c an filter, s ort, highlight, or tras h email s ent to that
partic ular addres s as you s ee fit.
P lus - addres s ing means never having to s ay you only have one
email addres s again.
A nd you'll be glad to know that G mail s upports plus - addres s ing,
affording you s ome rather powerful email handling, routing, and
filtering func tionality.
Some of my favorite us es of plus - addres s ing are:
Tagging a convers ation
Keep trac k of a partic ular email c onvers ationn o matter
how long it las ts b y c opying yours elf (i.e., putting yours elf
in the C c : field) with a plus - addres s (e.g.,
raelity+c onundrum@ gmail.c om or
raelity+tag+c onundrum@ gmail.c om). T hat way, s o long
as you're c opied on any ongoing c onvers ation, you'll
know jus t where it all s tarted (and, hopefully, eventually
ended).
I nviting people to a party
T his is jus t a variation on the previous theme of tagging
a c onvers ation. I nvite people to a party and c opy
yours elf with a plus - addres s (e.g.,
raelity+s c avengerhunt@ gmail.c om or
raelity+rs vp+s c avengerhunt@ gmail.c om) to label and
trac k RSV P s .
Signing up for s ervices
J us t about every online s ervic e has you provide an email
addres s in order to s ign up. I f you never want to hear by
mail from thes e people again (as ide from the initiala nd
often requiredc onfirmation email, that is ), as s ign a plus -
addres s to eac h s ervic e (e.g.,
raelity+morningtimes @ gmail.c om or
raelity+s ervic e+morningtimes @ gmail.c om) and, when
you've had quite enough of their follow- up mes s ages ,
announc ements , and s pec ial offers , s et up a filter
(http://gmail.google.c om/s upport/bin/ans wer.py?
ans wer=6 5 7 9 &query=filter&topic =&type=f) to direc t
them right into the Tras h.
Buying things online
Buying things online us ually involves s ome amount of
email traffic : purc has e c onfirmation, s hipping
notific ation, trac king, and problems . By as s igning a plus
addres s to eac h vendor (e.g.,
raelity+amazon@ gmail.c om or
raelity+s hopping+amazon@ gmail.c om), you c an group
all of your online trans ac tions with that vendor.
While there us ually is n't anything you c an
do about vendors and s ervic e providers
s haring your email addres s with others , at
the very leas t, you c an keep tabs on the
offending party.
Subs cribing to mailing lis ts
T here c omes a time in any s ubs c riber's life when s he
wants to dis ambiguate email pouring in from various
mailing lis ts from more important mail. G ive every
mailing lis t its own plus - addres s (e.g.,
raelity+xmls omething@ gmail.c om or
raelity+mailinglis t+xmls omething@ gmail.c om) and you
c an label or s iphon inc oming mailing lis t pos ts into your
A rc hive.
Hack 73. Import Your Contacts into
Gmail
Data entry's a drag. Export your contacts f rom an existing web
mail service, desktop email application, or database, and import
them into your Gmail address book.
P os s ibly the mos t annoying as pec t of moving into any new web
mail home is bringing all your family, friends , and bus ines s
c ontac ts along with you. T he average end us er has almos t been
trained not to expec t any s ort of import utility, ins tead s ighing
and s ettling in for an evening of data entry.
G mail, as with mos t pos t- 1 9 9 0 s web mail applic ations worth
their s alt, provides the fac ility for importing all thos e c ontac ts in
jus t a few c lic ks ; jus t how many depends on where you're
exporting them from. G mail ac c epts only one format: c omma-
s eparated values (C SV ). T hankfully, C SV is about as low a
c ommon denominator as you c ould wis h for; Yahoo! A ddres s
Book, O utlook, O utlook E xpres s , M ac O S X A ddres s Book (with
a little help from a free applic ation), E xc el, and many other
applic ations , web or otherwis e, s peak C SV.
G mail's H elp doc umentation on the s ubjec t of
importing c ontac ts is s ure to keep up with the
needs of its us ers , s o keep an eye on "H ow do I
import addres s es into my C ontac ts lis t? "
(http://gmail.google.c om/s upport/bin/ans wer.py?
ans wer=8 3 0 1 ).
6.6.1. Anatomy of a Contacts CSV
Firs t, a quic k tour of a typic al c ontac ts C SV file as c ons umed by
G mail's import tool.
C SV files , as the name s ugges ts , are little more than garden-
variety text files in whic h data is lis ted one rec ord per line, eac h
field s eparated by (you gues s ed it! ) a c omma. T he s imples t of
all contacts .cs v files might then look s omething like this :
name,email
Rael Dornfest,[email protected]
Tara Calishain,[email protected]
...
T he firs t line lis ts field names , in this c as e name and email
addres s . E ac h line thereafter is a s ingle pers on or entity
(bus ines s , organization, etc .) in your c ontac ts lis t with a
c orres ponding name and email addres s .
G mail ac c epts various formats of c ontac t entry, rec ognizing
s ome of the more c ommon fields s uc h as name, email addres s ,
phone, birthday, etc . H ere's a s lightly more detailed
contacts .cs v:
first name,last name,email address,phone
Rael,Dornfest,[email protected],(212) 555-1212
...
N otic e that name is s plit into firs t and las t name fields , email is
c alled email address, and there's a phone field too.
U nles s you're going to be us ing G mail as your main c ontac ts
databas ea nd I c an't quite s ee why you wouldy ou don't need to
import any more than name and email addres s (s omething akin
to the firs t contacts .cs v example) to find it us eful.
I n fac t, at the time of this writing, G mail
does little with fields beyond name and
email addres s but s hove them into a N otes
field.
6.6.2. Feed CSV to Gmail
A s s uming that you have a C SV file to work with (if you don't,
read on to the s ec tions below for s ome guidanc e), importing is a
s nap.
From the main G mail s c reen in your web brows er, c lic k the
C ontac ts link (Figure 6 - 1 ) found at the bottom of the menu on
the left s ide of the page.
Figure 6-1. Clicking the Contacts link gets you to
your Gmail contacts
T he C ontac ts page opens , lis ting all of (or none of, if you don't
yet have any) your exis ting G mail c ontac ts . T hes e may have
been entered by hand, gleaned from inc oming and outgoing mail,
or imported at s ome earlier date. C lic k the I mport C ontac ts link
link at the top right of the page.
C lic k the Brows e... (or equivalent) button when prompted to do
s o, as s hown in Figure 6 - 2 and find your C SV file on your
c omputer's hard drive. (J us t what this looks like depends on
your operating s ys tem and brows er, but es s entially you're jus t
c hoos ing a file muc h like you would from any applic ation.) C lic k
the I mport C ontac ts button andB ob's your unc le (that's "tada! "
for my A meric an readers )y ou s hould s ee a c onfirmation that all
went to plan and you've imported s ome number of c ontac ts into
your G mail addres s book.
Figure 6-2. Finding that CSV file
C lic k the Return to C ontac ts link and you'll s ee your now fully
s toc ked c ontac ts lis t. Figure 6 - 3 s hows mine, after importing
the s ec ond s ample C SV at the beginning of this hac k.
Figure 6-3. Feeding that CSV file to Gmail
D elete any number of c ontac ts by c lic king their as s oc iated
c hec kboxes and c lic king the D elete Selec ted button. E dit a
c ontac t by c lic king the appropriate [edit] link. O r type in a
c ontac t or three by hand us ing the A dd C ontac t link.
N ow, any time you s tart typing a known c ontac t's name into the
To, C c , or Bc c field of a new mes s age, G mail will autoc omplete it
for you. N o need to remember that c ous in A dam is
adamg@ ozzies urfers .c o.au or A untie J oan is
joan4 2 @ tepidmail.c om.
6.6.3. Out of Outlook (Express)
Both O utlook E xpres s and O utlook in Windows c an export their
addres s books as C SV.
I n O utlook E xpres s , s elec t File E xport A ddres s Book...,
c hoos e Text File (C omma Separated Values ) as your output
format (s ee Figure 6 - 4 ), and c lic k the E xport button.
Figure 6-4. Export your Outlook Express
Address Book as CSV
I n O utlook, s elec t File I mport and E xport..., c hoos e "E xport
to a file" and c lic k N ext, s elec t C omma Separated Values
(Windows ) as your output format, and c lic k N ext again. A n
E xport Wizard will then guide you the res t of the way to s aving
your c ontac ts as a C SV file.
Feed either to G mail as des c ribed earlier.
6.6.4. Hopping out of Hotmail
T here are a c ouple ways to hop out of H otmail with your c ontac ts
in tow. T he firs t goes by way of O utlook E xpres s or O utlook and
the s ec ond us ing a touc h of c opy- and- pas te, as s ugges ted by
the G mail team in their online H elp doc umentation.
6.6.4.1 By way of Outlook (Express)
A s des c ribed earlier, both O utlook E xpres s and O utlook are able
to export to C SV. Both are als o able to s ubs c ribe to H otmail
ac c ounts and s ync hronize c ontac ts therewith. P utting two and
two together, you c an us e O utlook (E xpres s ) as an intermediary
as follows .
Set up a new ac c ount in O utlook E xpres s or O utlook, c hoos ing
H T T P as the s erver type and H otmail as the mail s ervic e
provider, as s hown in Figure 6 - 5 .
Figure 6-5. Setting up a Hotmail email account in
Outlook Express
I n O utlook E xpres s , c lic k the A ddres s es ic on in the toolbar to
open your A ddres s Book. Selec t Tools Sync hronize N ow
(Figure 6 - 6 ) to s ync hronize your c ontac ts between O utlook
E xpres s and H otmail, thus bringing your H otmail c ontac ts to
your c omputer.
Figure 6-6. Synchronizing with Hotmail to grab a
local copy of your contacts
A fter a few moments of s ync hronization, your loc al A ddres s
Book will be up to date and you c an export thos e c ontac ts to
C SV as des c ribed earlier in the "O ut of O utlook (E xpres s )"
s ec tion.
6.6.4.2 By way of copy-and-paste
T his is one of thos e ugly methods that you c an't quite knoc k
bec aus e it jus t plain works .
L og into H otmail in your web brows er of c hoic e and s elec t the
C ontac ts tab, as s hown in Figure 6 - 7 . C lic k the P rint V iew link in
the H otmail toolbar.
Figure 6-7. Click the Print View link in the
Hotmail Contacts toolbar
I n the P rint V iew window that pops up, highlight everything (c lic k
and drag your mous e) from N ame at the top left to the bottom
mos t row in your lis t of c ontac ts . P res s C ontrol- C or s elec t E dit
C opy to c opy the c ontac ts , as s hown in Figure 6 - 8 .
Figure 6-8. Copying your contacts
O pen M ic ros oft E xc el, s tart a new workbook, s elec t the A 1 c ell,
and type C ontrol- V or s elec t E dit P as te to pas te in your
c ontac ts lis t. Your workbook s hould look s omething like Figure
6 -9 .
Figure 6-9. Pasting your contacts into an Excel
workbook
Save the workbook as "C SV (C omma delimited)" (never mind
the c ouple warnings about inc ompatibilities that E xc el throws at
you) and give the res ulting C SV file to G mail's import tool.
T his turns into an unholy mes s under M ac
O S X. C ontac ts are not nic ely s pread
ac ros s c olumns , leaving you with a row of
c ontac ts , empty c ells , and s ome odd
c harac ters in any C SV file that you
attempt to c reate.
6.6.5. Yumping from Yahoo!
Yahoo! A ddres s Book exports direc tly to C SV.
L og into Yahoo! and vis it your A ddres s Book (the A ddres s es
tab). C lic k the I mport/E xport link on the top right (Figure 6 - 1 0 ).
Figure 6-10. Using the Yahoo! Address Book's
Import/Export feature
O n the E xport s ec tion of the res ulting page, c lic k the Yahoo!
C SV E xport N ow button (Figure 6 - 1 1 ).
Figure 6-11. Exporting as Yahoo! CSV
Your brows er will mos t likely prompt you for a plac e to s ave the
C SV file on your c omputer's hard drive, as s hown in Figure 6 - 1 2 .
Figure 6-12. Saving the exported CSV file to
your hard drive
N ow, go ahead and import that C SV us ing the G mail import tool,
des c ribed earlier.
I do apologize for the bad "Yumping" pun,
but "Yahoo! " does n't leave you muc h room
for alliterated ac tion verbs : yodeling?
yanking?
6.6.6. Moving from .Mac
T he M ac O S X A ddres s Book only exports to s omething c alled
vC ard, whic h is unders tood by many c ontac ts applic ations , but
not by G mail.
T hankfully, s omeone's written a magic al little app to help.
A ddres s BookToC SV
(http://homepage.mac .c om/kenferry/s oftware.html#A ddres s BookT
freeware) s lurps up all of your c ontac ts n ame and email addres s
only, whic h is nic er to my mind than uploading a s lew of data
unnec es s ary for your G mailing needs o ut of A ddres s Book and
s pits them into a C SV file that you c an feed to G mail. D ownload
the app, mount the .dmg on your des ktop, and run it right from
there, as s hown in Figure 6 - 1 3 . (I f you'll likely us e it again and
again, go ahead and drag it into your Applications /Utilities folder.)
Figure 6-13. AddressBookToCSV exports
Address Book names and email addresses to
CSV
When prompted to do s o, c hoos e a plac e to s ave the contacts .cs v
file and c lic k the Save button. C los e the applic ation us ing
C ommand- Q (it does n't do s o by its elf when done).
Feed contacts .cs v to G mail as us ual.
6.6.7. Hand-Crafting a CSV
I f your c ontac ts exis t in s ome form with no obvious path to C SV,
you c an always export them in any way you c an, arriving at s ome
point at either a plain- text file that you c an manipulate by
handt edious , but pos s ibleo r s omething E xc el c an read. I f you
c an get to E xc el, you c an get to C SV ; mas s age the data into a
form s imilar to that dis c us s ed at the top of this hac k, s elec t File
Save A s ... and s ave as "C SV (C omma delimited)."
6.6.8. Last-Ditch Effort
I f, for whatever reas on, you c an't mas s age your c ontac ts into
C SV form or us e G mail's I mport C ontac ts tool, there is a
(admittedly grotty) way to get all your c ontac ts to G mail us ing
email its elf.
Send out a s ingle email mes s age (preferably one that announc es
your intention) to (on the To: line) your G mail ac c ount (or one
that forwards to your G mail ac c ount), c opying all your c ontac ts
on the C c : line.
You s hould probably batc h thes e s uc h that
there's s ome s emblanc e of privac y, with
your family not s eeing all of your bus ines s
as s oc iates ' addres s es and vic e vers a.
Send a s eparate mes s age for c ontac ts of a
s ens itive nature.
When you rec eive that mes s age at G mail, open it and c hoos e
"Reply to all." Write s omething explanatory again and s end it off.
G mail automatic ally adds to your c ontac t lis t the names and
email addres s es of the people you s end email to from G mail, s o
you've jus t added all of thos e people to your G mail addres s
book.
A gain, this is a rather annoying way
(annoying to your friends , family, and
bus ines s c ontac ts ) to get your c ontac ts
lis t to G mail, s o it s hould be regarded as a
las t- ditc h effort.
Rael Dornfes t and Jus tin Blanton
Hack 74. Import Mail into Gmail
Moving to Gmail doesn't have to mean starting f rom scratch.
Forward mail in bulk f rom your computer or other web mail
service to your Gmail account.
T he mos t entic ing feature of G mail is probably its ability to
perform G oogle- s tyle s earc hes on your own inbox. T he one
gigabyte of free s pac e is intriguing, but it's not muc h when you
c ons ider that you have far more than that available to you on
even your mos t outdated P C . A nd I 'd warrant that not even its
s nazzy J avaSc ripted us er interfac e is enough to tear you away
from your exis ting web mail s ervic e, uprooting yours elf and
s tarting over.
G mail does n't c urrently provide any way to import your exis ting
email arc hive (web mail s ervic e or des ktop mailbox). While you
already might have c ons idered forwarding all that mail to your
G mail ac c ount, jus t how to do s oe ven jus t the few hundred
"important" mes s ages i s quite a tric k.
N ot s o, thanks to hac ks like the G oogle M ail L oader for
forwarding des ktop mail and web mail intermediaries Y P O P s ! for
Yahoo! M ail and M SN email and G etM ail for H otmail.
6.7.1. Forward Desktop Mail
T he G oogle M ail L oader (http://www.marklyon.org/gmail; G N U
P ublic L ic ens e) is a point- and- c lic k applic ation that reads your
exis ting mail files on your c omputer and forwards the mes s ages
on to G mailo ne every two s ec onds , s o as not to overload or
otherwis e annoy the G mail s ervers . I t does s o without deleting
mail from your loc al c omputer; what's s ent to G mail is a c opy of
eac h and every mes s age. You c an even s et it to drop uploaded
mes s ages into your G mail I nbox or Sent M ail folder.
G M L is c ros s - platform and unders tands multiple mailbox
formats :
M box (us ed by N ets c ape, M ozilla, T hunderbird, and
many other email applic ations )
M ailD ir (Q mail and others )
M M D F (M utt)
M H (N M H )
Babyl (E mac s RM A I L ).
M ic ros oft O utlook, via a utility s uc h as P ST Reader
(http://www.mailnavigator.c om/reading_ms _outlook_ps t_fil
whic h c onverts O utlook's P ST files to M box format
6.7.1.1 Installing the hack
D ownload the Windows or L inux/M ac O S X, s ourc e- only vers ion
(http://www.marklyon.org/gmail/download.htm). T he Windows
vers ion is definitely the s imples t vers ion to s et up and us e,
requiring no prerequis ites and other bits and piec es .
T he s ourc e vers ion as s umes you have the P ython s c ripting
language and the P ython M ega Widgets
(http://pmw.s ourc eforge.net) toolkit ins talled.
T he ins and outs of ins talling G M L and all
the prerequis ites from s ourc e is beyond
the s c ope of this book. I f you need help,
c ons ult the doc umentation for P ython
(http://www.python.org) and P ython M ega
Widgets (http://pmw.s ourc eforge.net), or
as k your loc al tec hnic al guru or s ys tem
adminis trator.
I f, on the other hand, you have P ython on
your s ys tem and don't muc h c are whether
the G oogle M ail L oader is a des ktop or
c ommand- line applic ation, s kip ahead to
the "H ac king the hac k" s ec tion.
6.7.1.2 Running the hack
Sinc e G oogle M ail L oader works direc tly with your email
applic ation's mailboxes , you'll need to figure out where they live
before you c an go muc h further. C ons ult your email app's
preferenc es or doc umentation or jus t dig aroundb oth on your
hard drive and by googling for " outlook express" mailbox files
location, replac ing " outlook express" with the name of your email
program.
You'll als o need to make s ure that your mailbox files are in a
format that G oogle M ail L oader c an read, as lis ted in the
beginning of this hac k. I f there's any c onvers ion to do, do s o
now. For ins tanc e, us e P ST Reader
(http://www.mailnavigator.c om/reading_ms _outlook_ps t_files .html
to turn O utlook and O utlook E xpres s P ST files into D BX format.
With mailbox files in hand, launc h G oogle M ail L oader by double-
c lic king gmlw.exe on Windows or typing python gmlw.py on the
U nix or M ac O S X c ommand line. Figure 6 - 1 4 s hows G oogle M ail
L oader running under L inux.
Figure 6-14. Google Mail Loader
Work your way down the s ettings on the left half of the G M L
window:
1. T he default SM T P s erver (that's the s endmail s erver, the
one us ed to s end your mes s ages to G mail) of
gs mtp5 7 .google.c om works for mos t us ers . I f, for s ome
reas on, you are required by your loc al network
adminis trator or I nternet s ervic e provider to us e their
outgoing mail s erver, replac e the default with the
appropriate addres s . I f your outgoing mail s erver
requires authentic ation, c lic k the Requires
A uthentic ation c hec kbox and fill in your us ername and
pas s word.
2. C lic k the Find button and point G M L at your mailbox file.
I f your email applic ation us es M ailD ir format, s elec t any
file ins ide your M ailD ir direc tory.
3. From the File Type pull- down menu, c hoos e your mailbox
type (Figure 6 - 1 5 ). T here are two vers ions of M box
format: one s tric ter about the format of files and
therefore more ac c urate, while the other is more lenient
and works better on s ome M box files .
Figure 6-15. Select your mailbox file type
For s ome of the his tory, read J amie Zawins ki's "mail
s ummary files " at http://www.jwz.org/doc /mails um.html.
I f you don't know what format your mail applic ation us es , try
googling for mail format pine, replac ing pine with your mail app's
name. (P ine us es M box, by the by.)
G M L is able to upload both your inc oming and outgoing mail.
C hoos e M ail I Rec eived from the M es s age Type pull- down menu,
and mes s ages will be dropped into your G mail I nbox and appear
to be from the original s ender, jus t as they did in your email
applic ation's mailbox. I f you c hoos e M ail I Sent, the mes s ages
will be relabeled as c oming from your G mail addres s and appear
in your G mail Sent M ail folder.
G mail automatic ally labels inc oming
mes s ages as I nbox. T here's no way,
unfortunately, for an external applic ation to
c hange this behavior, s o mes s ages
imported as M ail I Sent will be labeled as
both Sent M ail and I nbox and appear in
both plac es . M ind you, there is only ac tual
one c opy of the mes s age s tored and s ent
mail is relabeled s o as to appear to be
from your G mail addres s , not your old
email addres s .
I f you A rc hive the c opy you s ee in your
G mail I nbox, it will then appear only in
Sent M ail (and A rc hive, of c ours e).
Finally, type in your full G mail addres s (e.g.,
hank@ gmail.c om).
C lic k the Send to G mail button and the applic ation will s tart
s ending mes s ages , one every two s ec onds . T he delay is
nec es s ary to prevent flooding of G oogle's s ervers .
I f you're interes ted in the details , c lic k the Save L og button to
s ave the c ontents of the output window to a file for later review.
T here are, as with any hac k of this s ort, s ome is s ues worth
noting:
T he times tamp of imported mes s ages in your G mail
I nbox will be that of when the mes s age was rec eived by
G oogle. I ns ide the mes s age its elf, however, the original
date is s till pres erved. You c an s earc h for parts of dates
to retrieve matc hing mes s ages : Aug 94, for ins tanc e, will
find all mes s ages from A ugus t of 1 9 9 4 .
T he c ount of mes s ages in your I nbox will not matc h the
number G M L reports as s ent. T his is due to the fac t that
the number G M L reports is the number of new threads ,
not individual mes s ages . G mail automatic ally groups
related mes s ages as they arrive.
Some people, es pec ially us ers of M ozilla or Firefox,
report problems with their M box files being c orrupt. I
have trac ked down a P ython s c ript
(http://www.marklyon.org/gmail/c leanmbox.py) that'll
c lean up mos t of thes e problems .
I mporting mail from O utlook is a bit s potty. I
rec ommend one of two things : import your O utlook mail
into O utlook E xpres s and then into the open s ourc e
T hunderbird mail applic ation
(http://www.mozilla.org/produc ts /thunderbird/), or us e
P ST Reader or the like to c onvert your O utlook mail to
M box.
6.7.1.3 Hacking the hack
I f you're a c ommand- line joc key or don't partic ularly relis h
ins talling the various prerequis ites (T k, P ython M ega Widgets )
nec es s ary to get the graphic al vers ion of G oogle M ail L oader
running, there's als o a text- only vers ion available at
http://www.marklyon.org/gmail/old/default.htm.
T he only requirement for the c ommand-
line G M L is P ython
(http://www.python.org).
H ere's a s ample s es s ion with the older G M L on the M ac O S X
c ommand line:
$ python gml.py
Mbox & Maildir to Google Mail Loader (GML) by Mark Lyon <mark@mark
Usage: gml.exe [mbox or maildir] [mbox file or maildir path] [gmai
SMTP Server]
Exmpl: gml.exe mbox "c:\mail\Inbox" [email protected]
Exmpl: gml.exe maildir "c:\mail\Inbox" [email protected] gsmtp171
$ python gml.py mbox ~/Library/Mail/Mailboxes/1999.mbox/mbox 'hank
Mbox & Maildir to Google Mail Loader (GML) by Mark Lyon <mark@mark
Done. Stats: 1 success 0 error 0 skipped.
6.7.2. Migrate from an Existing Web Mail
Service
D es pite attempts by your exis ting web mail s ervic e to entic e
you to s tay, G mail bec kons with its one gigabyte of s torage,
powerful s earc h, ric h web interfac e, and c hanc e of grabbing a
better email addres s than raelity973@. T hat s aid, you're loathe to
leave behind the las t year or three's email.
Well, you c an indeed take it with you, thanks to s ome nic e
donateware Web- to- P O P mail utilities . T hes e intermediaries
operate in one of two ways :
T he utility s its between your des ktop email applic ation
and web mail s ervic e, allowing you to download all of
your mail to your c omputer, after whic h you c an us e the
G oogle M ail L oader to feed it all to G mail.
T he utility c ombines thes e two s teps into one, grabbing
all of your web mail and forwarding it on in bulk to your
G mail ac c ount.
While there are no doubt any number of thes e utilities , two we
s tumbled ac ros s were G etM ail and Y P O P s !
O f c ours e, you may jus t opt to pay for P O P
mail ac c es s to your web mail s ervic e,
download all your mail like you would any
other, and us e the G oogle M ail L oader from
there. I f, however, you've gone this long
without paying for P O P s ervic e, c hanc es
are you're not going to do s o now jus t to
move out of the s ervic e.
6.7.2.1 Hop from Hotmail/MSN
G etM ail (http://www.e- eeas y.c om/G etM ail.as px; donate- ware) is
a two- in- one for H otmail and M SN that runs under Windows .
M ove any mes s ages that you want to s end ac ros s to your G mail
ac c ount to your H otmail I nbox (if you've previous ly filed them
els ewhere) and mark them as unread.
L aunc h G etM ail (s hown in Figure 6 - 1 6 ), provide it with your
H otmail/M SN ac c ount name and pas s word, and type your full
G mail addres s into the Forward To box. C hec k whatever options
you prefer; I 'd unc hec k the D elete c hec kbox. N ow c lic k C hec k
for N ew M ail to s et G etM ail in motion and go get a c up of c offee
while it moves all thos e mes s ages ac ros s for you. You c an even
leave it running, trans ferring your H otmail mes s ages to G mail on
an ongoing bas is .
Figure 6-16. GetMail can download
Hotmail/MSN messages and forward them on to
Gmail
6.7.2.2 Yank your Yahoo! Mail
Y P O P s ! (http://yahoopops .s ourc eforge.net/; donateware) is a
P O P mail proxy, s itting between your preferred email applic ation
and Yahoo! M ail. I t is available for Windows , M ac O S X, L inux,
and Solaris . T he Windows vers ion s elf- ins talls while the others
require that you c ompile from s ourc e c ode and s o are a little
more diffic ult for the uninitiated to get up and running.
M ove any mes s ages you want to download and c arry ac ros s to
G mail into your Yahoo! M ail I nbox and mark them as unread.
O n Windows , run Y P O P s ! after ins tallation. A little ic on appears
in your Windows tas kbar; double- c lic k it to get to the s ettings ,
s hown in Figure 6 - 1 7 .
Figure 6-17. YPOPs! proxies POP mail requests
While you c an go ahead and make a few c hanges in the s ettings ,
Y P O P s ! runs right out of the box without any further
c onfiguration.
N ow, s imply s et up a P O P mail ac c ount like any other, only
pointing to Y P O P s ! running loc ally as your mail s erverb oth
inc oming and outgoing. T he Y P O P s ! s ite has details on
c onfiguring mos t email c lients at
http://yahoopops .s ourc eforge.net/modules .php?
op=modload&name=Sec tions &file=index&req=lis tartic les &s ec id=1
O nc e you have downloaded all of your web mail to your c omputer,
us e the G oogle E mail L oader to s end all the c ontents of your
loc al inbox to G mail.
6.7.3. See Also
G mailerXP (http://gmailerxp.s ourc eforge.net;
donateware) is the be- all and end- all of G mail/Windows
integration, providing a full- featured frontend to your
G mail email, importing and uploading legac y mes s ages
to G mail, new mail notific ation, and s o on.
Mark Lyon, Jus tin Blanton, and Rael Dornfes t
Hack 75. Export Your Gmail
Back up or export your Gmail messages to your computer f or
saf e-keeping or of f line reading.
You're nic ely s ettled in to your new G mail ac c ount and may even
have brought over all of your email Hack #74 s inc e time began.
You're mailing up a s torm, taking full advantage of the one
gigabyte of s torage s pac e you're allotted.
What, now, if you dec ide G mail ac tually is n't for you and you'd
like to move out again, either to another Web mail s ervic e or
bac k to the more traditional email applic ation running on your
c omputer? O r perhaps you jus t want a loc al arc hive of your
G mail for s afe- keeping or offline trawling when you're on a plane
and des perately need a c opy of that meeting report.
A nifty little arc hiving s c ript pac kaged with the libgmail
(http://libgmail.s ourc eforge.net) P ython interfac e to G mail
Sec tion 6 .1 3 [H ac k #8 0 is jus t the tic ket. I t logs into your G mail
ac c ount for you, looks around, prompts you to s elec t a c ollec tion
of mes s ages to arc hive, and downloads them to your laptop or
des ktop.
6.8.1. Installing the Hack
T here's really nothing to do beyond downloading
(http://s ourc eforge.net/projec t/s howfiles .php?
group_id=1 1 3 4 9 2 , or c lic k the D ownloads link on the libgmail
home page) and uns tuffing the libgmail arc hive
(http://libgmail.s ourc eforge.net).
T he only requirement for libgmail is P ython
(http://www.python.org).
6.8.2. Running the Hack
A mong libgmail's demo applic ations is archive.py, a s c ript that
logs into G mail, downloads your email mes s ages , and s aves
them on your c omputer's hard drive in a format (M box) s uitable
for importing into many an email program:
O n the c ommand line (whether that be the Windows D O S- alike,
M ac O S X's Terminal, or U nix s hell), run the arc hive s c ript like
s o:
$ python demos/archive.py
You'll be prompted for your G mail ac c ount name and pas s word,
after whic h libgmail will log you in:
Gmail account name:
raelity
Password:
Please wait, logging in...
Log in successful.
T here we are. A t this point you c an c hoos e to arc hive jus t what's
in your inbox (0), all mes s ages (2), s tarred, drafts , s ent, or a
partic ular s et of labeled mes s ages (6 and 7 in my c as e). C hoos e
the as s oc iated number and hit the return key on your keyboard:
WARNING:root:Live Javascript and constants file versions differ.
Select folder or label to archive: (Ctrl-C to exit)
Note: *All* pages of results will be archived.
0. inbox
1. starred
2. all
3. drafts
4. sent
5. spam
6. foo
7. Peeps
Choice: 2
L ibgmail begins s lurping your mes s ages out of G mail, one by
one, and downloading them to an arc hive file in the c urrent
direc tory on your c omputer.
A s is s tated by he program at the outs et,
"* A ll* pages of res ults will be arc hived,"
meaning that all mes s ages in the
c ollec tion you've c hos en will be
downloaded, not jus t thos e that fit on a
s ingle page when you're looking at that
c ollec tion through the s tandard G mail web
brows er interfac e.
ff602fe48d89bc3 1 \<b\>Hello from Hotmail\</b\>
ff602fe48d89bc3 1 Hello from Hotmail
ff5fb9c2829c165 1 Hello Gmail via Gmail Loader
ff5fb9c2829c165 1 Hello Gmail via Gmail Loader
ff5691f7170cb62 1 Hello Gmail via Gmail Loader
ff5691f7170cb62 1 Hello Gmail via Gmail Loader
ff3f4310237b607 1 Howdy gmail-lite
ff3f4310237b607 1 Howdy gmail-lite
ff39c1fc71abbf1 1 Hello from Gmail mobile
ff39c1fc71abbf1 1 Hello from Gmail mobile
...
fbd0c388dd1684e 1 Hello, Gmail
fbd0c388dd1684e 1 Hello, Gmail
fbd0c1db3bcffe2 1 Gmail is different. Here's what you need to know
fbd0c1db3bcffe2 1 Gmail is different. Here's what you need to k
Select folder or label to archive: (Ctrl-C to exit)
Note: *All* pages of results will be archived.
0. inbox
1. starred
2. all
3. drafts
4. sent
5. spam
6. foo
7. Peeps
Choice: ^C
Done.
A nd we're done. C hoos e another c ollec tion to download and
arc hive if you wis h; otherwis e, pres s C ontrol- C on your keyboard
to s top the archive.py s c ript.
N ow, if you look in the direc tory from whic h you invoked
archive.py, you s hould s ee a new M box- format arc hive (the one I
jus t c reated is archive-all-1096849647.72.mbox) of your c hos en
c ollec tion of mes s ages , s uitable for importing into many an
email program:
jane:~/Desktop/libgmail-0.0.8 rael$ ls
ANNOUNCE constants.pyc
CHANGELOG demos
README libgmail.py
archive-all-1096849647.72.mbox lib
6.8.3. See Also
gmail.py
(http://www.holovaty.c om/blog/arc hive/2 0 0 4 /0 6 /1 8 /1 7 5 1
is a s imple P ython interfac e to G mail, foc us ing on
exporting raw mes s ages for bac kup and import.
Hack 76. Take a Walk on the Lighter Side
Gmail with grace f rom any web browser, whether JavaScript-
disabled, not yet supported, text-only, or on a PDA or mobile
phone.
Being a c hild of G oogle, G mail hides all of its c omplexity behind
a ric h, deep, feature- pac ked yet us er- friendly web mail interfac e,
as s uming you have the right brows ero ne of rec ent vintagefor the
job. But what to do if your I T department has n't upgraded your
vers ion of the I nternet E xplorer brows er s inc e Windows 9 5 ,
you're quite happy with the text- only Lynx brows er, you're
running the lates t nightly build of brows er XY Z, whic h G mail
s imply does n't like, or you're trying to reac h your mail from a
P D A or s martphone?
G mail- lite (home page: http://s ourc eforge.net/projec ts /gmail-
lite, Sourc eForge projec t page:
http://s ourc eforge.net/projec ts /gmail- lite; G N U P ublic L ic ens e),
as the name s ugges ts , puts a plain H T M L fac e on G mail. I t is a
P H P applic ation that proxies your interac tions with G mail,
allowing you to s urf us ing whatever brows er you have at hand or
jus t plain prefer jus t plain prefer, while keeping G mail happy with
its end of the c onvers ation.
T he authors of gmail- lite have done a fantas tic job, affording you
a plain H T M L interfac e to jus t about every bit of func tionality
G mail provides through its more interac tive J avaSc ript- bas ed
frontend.
6.9.1. Installing the Hack
You have to marvel at the wonders of P H P - bas ed applic ations
and their s imple ins tallation. A s s uming you have the
prerequis ites taken c are of, it's jus t a matter of downloading,
unpac king, and enjoying. I ins talled gmail- lite both on my loc al
M ac O S X laptop and under my hos ted I SP ac c ount in s ec onds
eac h.
G mail- lite as s umes you have P H P
ins talled on a web s erver. I t relies upon the
libgmailer library (http://gmail-
lite.s ourc eforge.net), inc luded for your
c onvenienc e in the gmail- lite dis tribution.
You als o need the c url library
(http://www.php.net/c url) with SSL s upport
(http://www.opens s l.org) s inc e gmail- lite
always talks to G mail over a s ec ure
c hannel.
D ownload gmail- lite (http://s ourc eforge.net/projec ts /gmail- lite)
and unpac k the dis tribution (0 .5 6 at the time of this writing, but
yours is s ure to be a later vers ion) s omewhere under your web
s erver's doc ument root, where the res t of your web s ite lives
(as k your s ys tem adminis trator or s ervic e provider if you're not
s ure where this is ):
$ tar -xvzf gmail-lite-
0.56
.tar.gz
gmail-lite-0.56/
gmail-lite-0.56/compose.php
gmail-lite-0.56/config.php
gmail-lite-0.56/debug.php
gmail-lite-0.56/diagnose.php
gmail-lite-0.56/dl.php
gmail-lite-0.56/docs.html
gmail-lite-0.56/index.php
gmail-lite-0.56/INSTALL
gmail-lite-0.56/libgmailer.php
gmail-lite-0.56/logout.php
gmail-lite-0.56/main.php
gmail-lite-0.56/star.gif
$ mv gmail-lite-0.56
gmail-lite
T hat las t c ommand s imply renames the
direc tory to s omething that will be a little
friendlier and eas ier to remember when it
c omes to vis iting in my brows er.
To verify that everything went to plan, point your web brows er at
diagnos e.php us ing the U RL c orres ponding to the gmail-lite
direc tory on your web s itee .g.,
http://www.example.com/~rael/gmail-lite/diagnos e.php T he
res ulting page s hould look like Figure 6 - 1 8 .
Figure 6-18. The diagnose.php script makes sure
that everything is installed as expected
I f diagnos e.php does indic ate that s omething's gone wrong,
c ons ult the ins tallation and troubles hooting doc umentation in
the I NSTALL text file in your gmail-lite folder.
6.9.2. Running the Hack
P oint your c omputer's web brows er at the U RL c orres ponding to
the gmail-mobile direc tory on your web s itee .g.,
http://www.example.com/~rael/gmail-lite (or jus t c lic k the "P res s
here..." link on the diagnos e.php page).
D epending on your s etup, you may
ac tually need to tac k /index.php on to that
U RL , but mos t P H P - enabled s ervers know
to look for and s erve up index.php as a
default when no filename is s pec ified and
there's no s tatic index.html in s ight. T he
gmail- lite pac kage inc ludes jus t s uc h an
index.php file.
Figure 6 - 1 9 s hows the plain H T M L login s c reen as it will appear
in a typic al brows er window. E nter your G mail login (e.g.,
[email protected]) and pas s word, alter the time zone if you feel
s o inc lined, and c lic k the "s ign- in" button.
Figure 6-19. A gloriously plain HTML Gmail login
page in a typical browser window
You're greeted with a s ummary page with links to your G mail
I nbox, Sent, Tras h, and Spam folders , Starred mes s ages , and
pers onal labels s hown in Figure 6 - 2 0 as it appears in a
s martphone XH T M L web brows er.
Figure 6-20. A summary of the state of your
Gmail account as it appears in a smartphone
XHTML browser
C lic k the I nbox link and you'll be pres ented with a s imple lis t of
your inc oming mes s ages , as s hown on a P oc ket P C in Figure 6 -
2 1 . A t the top is a quic k- link toolbar and pull- down menus for
s witc hing views , exploring labeled mail, and s earc hing your
G mail mes s ages ["G mail Searc h Syntax" earlier in this c hapter].
(C lic k the G et button after making your s elec tions or entering a
s earc h query.) A t the bottom are ac tions you c an apply to any
number of c hec ked mes s ages (c hec k the as s oc iated c hec kbox
to the right of a mes s age s ubjec t to ac t upon it), inc luding
arc hive/unarc hive, label, s tar/uns tar, mark as read/unread,
mark/unmark as s pam, and tras h/untras h. (C lic k the D o button
to apply any ac tion.)
Figure 6-21. Your Gmail inbox, as seen through
a Pocket PC
Selec t any email mes s age to open it. Figure 6 - 2 2 s hows a
typic al email mes s age viewed in the text- only Lynx brows er. T he
layout of individual mes s age pages is muc h like that of the
I nbox (or any folder) view. A s you c an s ee in the figure, I 'm
about to take a look at all my "foo"- labeled mes s ages .
Figure 6-22. An individual message in the Lynx
text-only browser
So, there you have it: a plain old H T M L interfac e to about
anything you c an do through G mail proper. (I n all likelihood, by
the time you read this , G mail will have built its own plain H T M L
vers ion without all the brows er and J avaSc ript requirements .
Still, this is a great hac k and worth fiddling about with.)
T he gmail- lite author does c aution that
"G M ail is a s till in beta, and G M ailer (along
with gmail- lite) is , I would s ay, an `alpha
hac k' of a beta s oftware. So don't expec t it
to work all the time, and do not build
c ritic al mis s ion applic ations upon it"
(http://gmail- lite.s ourc eforge.net/).
6.9.3. See Also
I f you're wanting to G mail from a mobile phone with only
a very bas ic WA P brows er on board, you c an s till G mail
on the go with gmail- mobile Hack #77. A nd be s ure to
c hec k out Hack #67] for taking G oogle s earc h along too.
Hack 77. Gmail on the Go
You can take it with you ... Gmail on your mobile phone, that is.
Web mail means never having to s ay you're s orry that you left
your laptop at home. While I c an't quite fathom it mys elfI keep a
lot I need beyond bas ic email on my laptopt here are thos e that
wander the world s ans the very c ore of the mobile offic e. T hey're
happy to us e O P 's (other people's ). "Where there's a web
brows er, there's a way" is their c redo, and for thos e who c an
s wing it, more power to them.
Where this falls down for me are the between times : das hing to a
meeting without the lates t agenda in hand (it's in my email
inbox, but my laptop's in my bag and there's no wireles s network
in s ight), meandering a foreign c ity and wanting to keep in touc h
with the folks bac k home but without having to lug around a
laptop, and other moments s uc h as thes e.
T he brows er experienc e on even the s martes t of s martphones
has a way to go. A nd mos t folks don't have any more of the
I nternet on their phones than a bas ic text- only WA P view of the
world Hack #67. While WA P works to s ome degree, web mail
s ervic es don't tend to s pend muc h time, if any at all, on
providing a WA P interfac e to your email.
But there's always a workaround ...
G mail- mobile (http://s ourc eforge.net/projec ts /gmail- mobile;
G N U P ublic L ic ens e) is a P H P (http://www.php.net) applic ation
that s its on your web s ite, between your mobile phone's WA P
brows er and G mail, brokering reques ts on your behalf and
returning a mobile- appropriate view of your G mail mail.
T his hac k as s umes you have an ac c ount
that allows WA P ac c es s to the wild, woolly
Web from your mobile phone. C hec k with
your mobile operator about your data plan,
and don't forget to as k what you're
c harged per megabyte, bec aus e even the
lightes t of interac tions c an add up over
time.
You c an c atc h a quic k s tatus update, read, and even reply to
your G maila nd there are more features promis ed.
6.10.1. Installing the Hack
I ns talling gmail- mobile is a piec e of c ake; I ins talled it under
both M ac O S X and L inux in a matter of s ec onds eac h.
G mail- mobile as s umes you have P H P
ins talled on a web s erver running on port
8 0 (the WA P, and indeed web, default). You
als o need the c url library
(http://www.php.net/c url), whic h gmail-
mobile us es to talk to G mail over the Web
and the libgmailer (http://gmail-
lite.s ourc eforge.net) Hack #80 library,
inc luded for your c onvenienc e in the
gmail- mobile dis tribution.
D ownload gmail- mobile (http://s ourc eforge.net/projec ts /gmail-
mobile) and unpac k the dis tribution (0 .1 1 at the time of this
writing, but yours is s ure to be a later vers ion) s omewhere under
your web s erver's doc ument root, where the res t of your web s ite
lives (as k your s ys tem adminis trator or s ervic e provider if
you're not s ure where this is ):
$ tar -xvzf gmail-mobile-
0.11
.tar.gz
gmail-mobile-0.11/
gmail-mobile-0.11/AUTHORS
gmail-mobile-0.11/COPYING
gmail-mobile-0.11/INSTALL
gmail-mobile-0.11/README
gmail-mobile-0.11/TODO
gmail-mobile-0.11/compose.php
gmail-mobile-0.11/config.php
gmail-mobile-0.11/index.php
gmail-mobile-0.11/libgmailer.php
gmail-mobile-0.11/logout.php
gmail-mobile-0.11/main.php
gmail-mobile-0.11/star.gif
$ mv gmail-mobile-0.11
gmail-mobile
T hat las t bit renamed the gmail- mobile
direc tory to s omething a little eas ier to
type on my mobile phone's keypad.
A nd you're done. N o, really, I was s urpris ed too at jus t how eas y
it was .
By default, gmail- mobile us es brows er c ookies to maintain s tate
between reques ts to G mail's s ervers . I f you have P H P Ses s ion
(http://www.php.net/s es s ion) ins talled, you c an c hoos e to us e it
ins tead of c ookies . J us t c omment out the appropriate line in the
config.php file in your newly unpac ked gmail-mobile direc tory.
H ere, I 've left things as they were, us ing the c ookie default:
<?php
require_once("libgmailer.php");
/** Session handling method. You must at least choose (uncommen
/**** have PHP Session installed, prefer to use cookie to store
//$config_session = (GM_USE_PHPSESSION | GM_USE_COOKIE);
/**** have PHP Session installed, prefer NOT to use cookie **/
//$config_session = (GM_USE_PHPSESSION | !GM_USE_COOKIE);
/**** do not have PHP Session installed **/
$config_session = (!GM_USE_PHPSESSION | GM_USE_COOKIE);
?>
6.10.2. Running the Hack
With the eas y part out of the way (is n't it wonderful when
ins tallation and c onfiguration is the eas y part? ) you're ready to
break out your mobile phone's brows er and muddle through
typing on that minute keypad.
Before trying this out from your mobile phone (and to remove one
variable in c as e s omething does n't work as expec ted), point
your c omputer's web brows er at a U RL c orres ponding to the
gmail- mobile direc tory on your web s itee .g.,
http://www.example.com/~rael/gmail-mobile.
You may ac tually need to tac k /index.php
on to that U RL , but mos t P H P - enabled
s ervers know to look for and s erve up
index.php as a default when no filename is
s pec ified and there's no s tatic index.html
in s ight. T he gmail- mobile pac kage
inc ludes jus t s uc h an index.php file.
Your brows er will res pond in one of two ways . E ither it'll s erve up
the raw WM L s ourc e delivered by gmail- mobile, as s hown in
Figure 6 - 2 3 , or it'll throw up its hands in c onfus ion and prompt
you to s ave the s ourc e as a file on your hard drive. I f the s ourc e
(dis played in your brows er or s aved and opened us ing s omething
like TextE dit on M ac O S X or N otepad on Windows ) looks
s omething like Figure 6 - 2 3 and does n't s eem to report any P H P
or other errors , you're ready to s witc h to your mobile phone.
Figure 6-23. Raw Gmail Mobile WAP as viewed
through a regular browser
L aunc h your mobile phone's WA P brows er Hack #67 and key in
the appropriate U RL to reac h the gmail-mobile direc tory on your
web s ite, as above.
A fter a few moments of c hurning (WA P is lightweight, but mos t
mobile bandwidth is on the light s ide too), you s hould be greeted
with a login s c reen (Figure 6 - 2 4 , left). Key in your G mail login
(
[email protected]) and pas s word, alter the time zone if you feel
s o inc lined, and c lic k O K. J us t where you find O K will vary from
phone to phone, WA P brows er to WA P brows er. I found it under
the left s oft key Servic e options O K on my N okia Series
6 0 phone, as s hown in Figure 6 - 2 4 , right.
Figure 6-24. Log in to Gmail Mobile from your
mobile's WAP browser (left) and click OK
(right)
A few more moments of c hurning and you s hould s ee a s ummary
view of your G mail ac c ount (Figure 6 - 2 5 , left). To vis it any of the
folders , navigate over the appropriate link and s elec t it, muc h as
you would links in a regular brows era lbeit with es oteric
keys trokes rather than a mous e. Figure 6 - 2 5 , right, s hows my
rather empty inbox.
Figure 6-25. Take a gander at a summary of the
state of your Gmail (left) and visit your inbox
(right)
V is it any mes s age (Figure 6 - 2 6 , left, s hows a s ample email
mes s age) in any of your mailboxes by s elec ting its link.
C ompos e a new mes s age by s elec ting the C ompos e link; reply
us ing the Reply link at the bottom of a mes s age. Figure 6 - 2 6 ,
right, s hows the c ompos ition window in ac tion.
Figure 6-26. Read (left) and respond to (right)
Gmail mail on the go
While you c an't (at leas t, at the time of this writing) c reate, alter,
or delete G mail labels , you c an s ee what they are (Figure 6 - 2 7 ,
left) by following the L abels link on the Summary s c reen (Figure
6 - 2 7 , right, s hows all of my mes s ages labeled "P eeps ."
Figure 6-27. Browse your Gmail labels (left)
and visit labeled messages (right)
I t's not the s piffy tric ked- out G mail interfac e that you've c ome
to expec t, but it's a great way to take your G mail with youa nd,
quite frankly, it's better than s ome of the mobile email
applic ations that I 've c ome ac ros s .
6.10.3. See Also
T he gmail- mobile projec t has on its to- do lis t jus t about
anything you're c urrently wis hing for, inc luding s earc h,
arc hive, delete, forward, label, mark as s pam, and
working with the G mail addres s book Hack #73. Keep an
eye on the projec t page
(http://s ourc eforge.net/projec ts /gmail- mobile) for the
lates t news and dis tributions .
I f you're new to mobile brows ing, you might want to take
a gander at [Hack #67] .
Hack 78. Use Gmail as a Linux
Filesystem
Repurpose your gig of Gmail as a networked f ilesystem.
What I wouldn't give for a s pare gig of networked files ys tem on
whic h to s tas h a bac kup of my work in progres s or as an
intermediary between two firewalled s ys tems (thus not direc tly
reac hable from one to the other).
G mailFS (http://ric hard.jones .name/google- hac ks /gmail-
files ys tem/gmail- files ys tem.html) puts your gigabyte of G mail
s torage to work for jus t s uc h a purpos e. I t provides a mountable
L inux files ys tem repurpos ing your G mail ac c ount as its s torage
medium.
G mailFS is a P ython applic ation that us es the FU SE
(http://s ourc eforge.net/projec ts /avf) us erland files ys tem
infras truc ture to help provide a files ys tem and the libgmail
(http://libgmail.s ourc eforge.net) Hack #80 library to
c ommunic ate with G mail.
G mailFS s upports mos t file operations , s uc h as read, write,
open, c los e, s tat, s ymlink, link, unlink, trunc ate, and rename.
T his means that you c an us e the lion's s hare of your favorite
U nix c ommand- line tools (cp, ls, mv, rm, ln, grep, et al) to operate
on files s tored on G oogle's G mail s ervers .
So, what might you s tore on and do with the G mail files ys tem?
A bout anything that you would with any other (pos s ibly
unreliable) networked files ys tem built on a c ool hac k or three.
Figure 6 - 2 8 s hows the Firefox web brows er launc hed from an
exec utable s tored as a mes s age in my G mail ac c ount.
Figure 6-28. Reading my Gmail via the Firefox
web browser launched from an executable
stored on the selfsame Gmail account
T his is my firs t foray into P ython and I 'm
s ure the c ode is far from elegant. T hat
s aid, the language has a reputation as an
exc ellent c hoic e for rapid prototypinga nd
this was borne out in my experienc e. T he
firs t working vers ion of G mailFS took about
two days of c oding with an additional day
and a half s pent on performanc e tuning and
bug fixing. G iven that this inc ludes the
learning c urve of the language its elf, the
reputation s eems well des erved.
A s pec ial mention s hould go to libgmail
and FU SE , as both greatly c ontributed to
the s hort development time.
(I 'm partic ularly c onc erned with my
attempts to manipulate mutable byte
arrays . I 'm s ure that there mus t be a les s
c lums y way of doing it than the nas ty lis t
array s tring path that I 'm
c urrently us ing.)
So, do be c areful us ing the G mailFS and
c ertainly don't us e it for anything
important.
6.11.1. Implementation Details
A ll meta- information in the G mailFS is s tored in the s ubjec ts of
emails s ent by the G mail us er to thems elves .
T his was not as good an idea as I 'd firs t
thought. I thought I c ould s peed things up
by grabbing the mes s age s ummary without
having to download the entire mes s age, as
G mail elides s ubjec ts (abbreviates them
and adds ellips es ) to fit them on the
s c reen, but it turned out that I needed to
get the full mes s age anyway. (Yes , the
mes s age bodies are empty, but it does add
c ons iderable latenc y to operations s uc h
as lis ting the c ontents of a large
direc tory.)
T he ac tual file data is s tored in attac hments . Files c an s pan
s everal attac hments , allowing file s izes greater than the
maximum G mail attac hment. File s ize s hould be limited only by
the amount of free s pac e in your G mail ac c ount.
T here are three types of important s truc tures in the G mailFS:
D irec tory and file entry s truc tures hold the parent path
and name of files or direc tories . Symlink information is
als o kept here. T hes e s truc tures have a referenc e to the
file's or direc tory's inode (a data s truc ture holding
information about where and how the file or direc tory is
s tored).
I node s truc tures hold the kind of information us ually
found in a U nix inode, s uc h as mode, uid, gid, s ize, etc .
D ata bloc k s truc tures are one of three types of
mes s ages G mailFS us es to s tore information related to
the files ys tem. T he s ubjec t of the mes s ages holding
thes e s truc tures c ontains a referenc e to the file's inode
as well as the c urrent bloc k number.
A s G mailFS c an s tore files longer than the maximum
G mail attac hment s ize, it us es bloc k numbers to refer to
the s lic e of the original file that this data bloc k mes s age
refers to. For example, if you have a bloc ks ize of 5 M B
and a file 2 2 M B long, you will have five bloc ks (5 M B, 5
M B, 5 M B, 5 M B, and 2 M B); the bloc k numbers for thes e
will be 0 , 1 , 2 , 3 , and 4 , res pec tively.
A ll s ubjec t lines c ontain an fs name (files ys tem name) field that
s erves two purpos es .
P revents the injec tion of s purious data into the
files ys tem by external attac kers . A s s uc h, the fs name
s hould be c hos en with the s ame c are that you would
exerc is e in c hoos ing a pas s word.
A llows multiple files ys tems to be s tored on a s ingle
G mail ac c ount. By mounting with different fs name
options s et, the us er c an c reate dis tinc t files ys tems .
6.11.2. Installing the Hack
T his is n't for the uninitiated. I haven't provided newbie- foc us ed
s tep- by- s tep ins tallation ins truc tions , bec aus e if you aren't able
to take c are of s ome of thes e details yours elf, you probably
s houldn't be muc king about in this hac k. I f you're out of your
depth, s it bac k, relax, and read on for edific ation's s ake.
Before you begin, make s ure you have P ython 2 .3 and the
python2 .3 - dev pac kages ins talled.
I ns tall Vers ion 1 .3 of FU SE
(http://s ourc eforge.net/projec ts /avf). Some L inux dis tributions
(s uc h as D ebian) make this available as a pac kage. I f your
dis tro does n't, you'll need to download the s ourc e
(http://s ourc eforge.net/projec t/s howfiles .php? group_id=2 1 6 3 6 )
and make and ins tall it manually.
N ext you'll need the P ython FU SE bindings
(http://ric hard.jones .name/google- hac ks /gmail- files ys tem/fus e-
python.tar.gz). D ownload and extrac t fus e-python.tar.gz and
follow the ins truc tions in fus e-python/I NSTALL.
T he P ython FU SE bindings are als o
available from FU SE 's C V S page
(http://s ourc eforge.net/c vs /?
group_id=2 1 6 3 6 ), but if you grab C V S,
remember that the P ython bindings don't
work with the res t of C V S at the moment
(at the time of this writing); you s till need
to us e FU SE 1 .3 .
G rab libgmail (http://s ourc eforge.net/projec t/s howfiles .php?
group_id=1 1 3 4 9 2 ) Hack #80. A fter unarc hiving the pac kage,
c opy libgmail.py and cons tants .py to s omewhere P ython c an find
them (/us r/local/lib/python2.3/s ite-packages works for D ebian;
others may vary).
Finally, download G mailFS (http://ric hard.jones .name/google-
hac ks /gmail- files ys tem/gmailfs .tar.gz) its elf and unarc hive it.
C opy gmailfs .py to s omewhere eas ily ac c es s ible
(/us r/local/bin/gmailfs .py, for example) and mount.gmailfs (a
modified vers ion of mount.fus e dis tributed with FU SE 1 .3 ) to
/s bin/mount.gmailfs .
I f you have an older vers ion of P ython
interfering with the running of G mailFS and
would rather have it us ing a newer vers ion,
alter the firs t line of gmailfs .py to point at
#!/path/to/newer/python2.3 rather than the
#!/usr/bin/env python default.
Take a moment to enjoy jus t how muc h you know about s uc h
things and move on when you're ready.
6.11.3. Running the Hack
A ll that remains is to mount your G mail files ys tem.
You c an do s o via fstab or on the c ommand line us ing mount. To
us e fstab, c reate an /etc/fs tab entry that looks s omething like
this :
/usr/local/bin/gmailfs.py /path/of/mount/point gmailfs \
noauto,username= gmailuser ,password= gmailpass ,fsname= zOlRRa
Replac e gmailuser and gmailpass with your G mail us ername and
pas s word, res pec tively. T he value you pas s to fsname is one
you'd like to dub this G mail files ys tem.
I t is important to c hoos e a hard- to- gues s
name here. I f others c an gues s the fsname,
they c an c orrupt your G mail files ys tem by
injec ting s purious mes s ages into your
inbox (read: s ending you mail).
To mount the files ys tem from the c ommand line, us e the
following c ommand:
# mount -t gmailfs /usr/local/bin/gmailfs.py /path/of/mount/point
-o username= gmailuser ,password= gmailpass ,fsname= zOlRRa
A gain, replac e gmailuser, gmailpass, and zOlRRa with your G mail
us ername, G mail pas s word, and preferred files ys tem name.
A t the time of this writing, both of thes e
c ommand- line invoc ations have s erious
s ec urity is s ues . I f you run a multius er
s ys tem, others c an eas ily s ee your G mail
us ername and pas s word. I f this is a
problem for you, your only option at
pres ent is to modify gmailfs .py its elf,
c hanging DefaultUsername, DefaultPassword,
and DefaultFsname as appropriate.
A future vers ion of G mailFS (perhaps
already out by the time you read this ) will
load thes e values from c onfiguration files
in your home direc tory.
Figure 6 - 2 8 s hows my mounted gmailfs files ys tem in ac tion.
6.11.4. Things You Should Know
T here are a few things you s hould know as you s tart s trolling
about and s toring things on your G mail files ys tem:
G mailFS als o has a bloc ks ize option, the default being 5
M B. Files s maller than the minimum bloc ks ize will only
us e the amount of s pac e required to s tore the file, not
the full blocks ize. N ote that any files c reated during a
previous mount with a different bloc ks ize will retain their
original bloc ks ize until deleted. For mos t applic ations
you will make bes t us e of your bandwidth by keeping the
bloc ks ize as large as pos s ible.
When you delete files , G mailFS will plac e the files in the
tras h. T he libgmail library does not c urrently s upport
purging items from the tras h, s o you will have to do this
manually through the regular G mail web interfac e.
To avoid s eeing the mes s ages c reated for your G mail
files ys tem in your inbox, you probably want to c reate a
filter (http://gmail.google.c om/s upport/bin/ans wer.py?
ans wer=6 5 7 9 &query=filter&topic =&type=f) to
automatic ally arc hive G mailFS mes s ages as they arrive
in your inbox. T he bes t approac h is probably to s earc h
for the fsname value; it'll be in the s ubjec t of all your
G mailFS mes s ages .
6.11.5. Outstanding Issues
A t the time of this writing, there are s ome outs tanding is s ues
with G mailFS that you s hould be aware of:
I don't rec ommend s toring your only c opy of anything
important on G mailFS for the following two reas ons :
G mailFS is c urrently a 0 .2 releas e and s hould be
treated as s uc h. You c an depend on its being
undependable.
T here's no c ryptography involved, s o your files
will all be s tored in plain text on G oogle's G mail
s ervers . T his will no doubt make s ome of you
nervous .
P erformanc e is ac c eptable for uploading and
downloading very large files (obvious ly dependent on
your having dec ent bandwidth). H owever, operations
s uc h as lis ting the c ontents of a large direc tory, whic h
requires many round trips , are extremely s low. T he poor
performanc e here is largely independent of bandwidth
and is related to having to grab entire mes s ages ins tead
of being able to us e mes s age s ummaries .
I haven't done any tes ting where G mailFS opens the
s ame file multiple times and performs s ubs equent
operations on the file. I s us pec t it will behave badly.
I f all of this does n't dis s uade you from giving G mailFS a whirl,
have at it and enjoy. J us t be s ure to vis it the G mailFS page
(http://ric hard.jones .name/google- hac ks /gmail-
files ys tem/gmail- files ys tem.html) to find out what's new and
grab the lates t ins truc tions and c ode.
6.11.6. See Also
G mc p
(http://mindtric k.net/arc hives /2 0 0 4 /0 8 /gmail_as _an_onli
is a s mall P erl utility to employ G mail as a bac kup
s ervic e.
T here's als o a P H P
(http://ilia.ws /arc hives /1 5 _G mail_as _an_online_bac kup_s
bac kup utility.
G mail D rive Shell E xtens ion
(http://www.viks oe.dk/c ode/gmail.htm; Windows only)
wraps up the G mailFS into a virtual files ys tem vis ible as
jus t another drive in Windows E xplorer.
[Hack #79] .
Richard Jones
Hack 79. Use Gmail as a Windows Drive
Drop a gig of Gmail storage on your Windows desktop and treat
it just about like any other drive.
I f [Hack #78] had you Windows us ers s alivating over the
pros pec t of adding a gigabyte of networked s torage to your
c omputer, do we have a find for you. G M ail D rive
(http://www.viks oe.dk/c ode/gmail.htm) drops the gigabyte of
s torage allotted your G mail ac c ount right on to your very
des ktop. I t looks and feels jus t like a regular hard drivea lbeit a
tad s lower (more than a tad if you're on dialup) being networked
rather than loc al.
A nd it's as s imple as one might hope, being a Windows
applic ation: none of the odd libraries to ins tall, fs tab entries
(whatever thos e are) to edit, and fus s of the L inux vers ion you
paged pas t jus t a moment ago.
P oint your brows er at http://www.viks oe.dk/c ode/gmail.htm,
s c roll down to the D ownload Files s ec tion, and grab a c opy of
G M ail D rive. T he download s hould take only a few s ec onds .
U nzip the ins taller and double- c lic k the Setup ic on. A few
moments later, you s hould s ee a brand- s panking- new G mail
D rive under M y C omputer in Windows E xplorer.
C lic k the link and you'll be prompted to log in, as s hown in Figure
6 -2 9 .
Figure 6-29. GMail Drive prompting for Gmail
login
You'll notic e G M ail D rive provides other options (jus t c lic k the
M ore button in the L ogin window), inc luding s ec ure H T T P for
enc rypted interac tion with your remote "drive."
E nter your G mail us ername and pas s word and c lic k the O K
button to log in. A few s ec onds later, your drive will be ready to
us e. D rag- and- drop files merrily to and fro, between your loc al
drive and G M ail D rive.
Right- c lic k the G M ail D rive ic on to log out, c hec k properties
(us ed s pac e, free s pac e), or to log bac k in. You'll notic e the
L ogin option is ac tually "L ogin A s ..." T his means you c an mount
the G M ail A c c ount of a friend or family member as eas ily as you
c an your own. Trans fer that home movie to the grandparents '
c omputer, s hare your forays into mus ic remixing with your
friends , or move files between your home and offic e c omputers
without need of toting about an external hard drive or s helling
out for a 1 gig U SB drive.
6.12.1. See Also
[Hack #78] .
Hack 80. Program Gmail
Try your hand at writing an alternative interf ace to Gmail using
the f reely available Python, Perl, PHP, Java, and .NET libraries
and A PI f rameworks.
T he relatively s imple and lightweight data interfac e to G mail
s tems from the s eparation between us er interfac e (c lient- s ide
J avaSc ript) and data model. T his has s pawned myriad frontends
(graphic al and otherwis e), libraries , and unoffic ial "A P I "
implementations in P ython, P erl, P H P, J ava, and .N E T.
For a glimps e of the G mail engine and protoc ol underlying the
offic ial G mail interfac e and the lion's s hare of the unoffic ial A P I s
and libraries written to the s ervic e, take a gander at J ohnvey
H wang's "A bout the G mail engine and protoc ol"
(http://johnvey.c om/features /gmailapi/; s c roll down).
P rogrammatic ac c es s to G mail is
ac c omplis hed by s c reen- s c raping either
the web interfac e or its underlying data
format. While the data format is pretty
s imple and is n't expec ted to c hange
dramatic ally, there's no telling what
G oogle might do that c ould advers ely
affec t the various programmatic interfac es
to their s ervic e. T hus , it goes without
s aying that s uc h hac kery c omes with no
quality of s ervic e guarantee. I n other
words , expec t breakages . A nd if you do
notic e s omething's gone wrong, vis it the
home page of your c hos en programmatic
interfac e for the lates t vers ion of the c ode,
news , and further information.
Rather than taking you s tep by s tep through the s ame c ode in
eac h of the five languages and frameworks , I provide a walk
through in P ython. T he A P I s are all rather s imilar, whic h
s houldn't c ome as any great s urpris e s inc e they are all built
upon the G mail "A P I " us ed by the c andy- c oated J avaSc ript-
powered G mail Web interfac e.
6.13.1. Python
T he libgmail (http://libgmail.s ourc eforge.net; G N U P ublic
L ic ens e 2 .0 /P SF) P ython binding for G mail provides a nic e,
c lean interfac e (as you'd expec t from P ython) to your G mail
ac c ount.
L ibgmail bundles a lovely s et of us eful example applic ations ,
us able right out of the box:
archive.py
D ownloads your G mail mes s ages to your c omputer for
arc hiving, importing, or moving purpos es .
gmails mtp.py
P roxies SM T P reques ts , allowing you to us e G mail to
s end email from the c omfort of your preferred email
applic ation; a related s c ript, s endms g.py, s ends a s ingle
email mes s age via G mail from the c ommand linen ot
unlike us ing the U nix mail c ommand.
gmailpopd.py
P roxies a s tandard P O P interfac e to mail from your
preferred email applic ation.
gmailftpd.py
P retends to be an FT P s erver, allowing you to download
(only) mes s ages labeled "ftp" via a s tandard FT P
applic ation.
6.13.1.1 Installing the hack
I ns tallation is jus t a matter of downloading and unpac king the
library (http://s ourc eforge.net/projec t/s howfiles .php?
group_id=1 1 3 4 9 2 , or c lic k the D ownloads link on the libgmail
home page) and putting it s omeplac e findable by P ython.
6.13.1.2 The code
L ibgmail s ports muc h func tionality, eac h with its own rather s elf-
explanatory func tion name: getMessagesByQuery, getQuotaInfo,
getLabelNames, getMessagesByLabel, getrawMessage, and
getUnreadMsgCount. L eaf through libgmail.py for the kind of details
only a programmer c ould love.
H ere's a s nippet of s ample c ode s howing off login, folder
s elec tion, and s trolling through emailt hread by thread, mes s age
by mes s age:
#!/usr/bin/python
# libgmail_example.py
# A simple example of the libgmail Python binding for Gmail in act
# http://libgmail.sourceforge.net/
# Usage: python gmail_in_python.py
import libgmail
# Login
gmail = libgmail.GmailAccount('[email protected]', '12bucklemyshoe
gmail.login( )
# Select a folder, label, or starred messages--in this case, the I
folder = gmail.getMessagesByFolder('inbox')
# Stroll through threads in the Inbox
for thread in folder:
print thread.id, len(thread), thread.subject
# Stroll through messages in each thread
for msg in thread:
print " ", msg.id, msg.number, msg.subject
Replac e [email protected] and 12bucklemyshoe with your G mail
email addres s and pas s word. I ns tead of inbox, you c an us e any
of G mail's s tandard folder names or your c us tom labels e .g.,
starred, sent, or friends.
Save the c ode to a file c alled gmail_in_python.py.
6.13.1.3 Running the hack
Run the gmail_in_python.py s c ript on the c ommand line, like s o:
$ python libgmail_example.pl
WARNING:root:Live Javascript and constants file versions differ.
ff602fe48d89bc3 1 Hello from Hotmail
ff602fe48d89bc3 1 Hello from Hotmail
ff5fb9c2829c165 1 Hello Gmail via Gmail Loader
ff5fb9c2829c165 1 Hello Gmail via Gmail Loader
ff5691f7170cb62 1 Hello Gmail via Gmail Loader
ff5691f7170cb62 1 Hello Gmail via Gmail Loader
ff3f4310237b607 1 Howdy gmail-lite
ff3f4310237b607 1 Howdy gmail-lite
6.13.1.4 Hacking the hack
Swap in a c all to getMessagesByQuery and you now have a
c ommand line right to the G mail s earc h engine:
#folder = gmail.getMessagesByFolder('inbox')
folder = gmail.getMessagesByQuery('from:rael subject:Howdy')
H ere are the res ults of this little s witc h:
$ python libgmail_example.pl
WARNING:root:Live Javascript and constants file versions differ.
ff3f4310237b607 1 Howdy gmail-lite
ff3f4310237b607 1 Howdy gmail-lite
6.13.1.5 See also
gmail.py
(http://www.holovaty.c om/blog/arc hive/2 0 0 4 /0 6 /1 8 /1 7 5 1 ) is a
s imple P ython interfac e to G mail, foc us ing on exporting raw
mes s ages for bac kup and import.
6.13.2. Perl
Mail::Webmail::Gmail (http://s earc h.c pan.org/~minc us /M ail-
Webmail- G mail- 1 .0 0 /lib/M ail/Webmail/G mail.pm) provides P erl
hac kers a programmatic interfac e to G mail. You'll find full P O D
doc umentation and more s ample c ode than you c an s hake a
s tic k it in the module and online at the aforementioned U RL .
A C omprehens ive P erl A rc hive N etwork (C P A N ) s earc h for gmail
(http://s earc h.c pan.org/s earc h? query=gmail&mode=all) at the
time of this writing finds three more P erl G mail libraries :
Mail::Webmail::Gmail, WWW::GMail, and WWW::Scraper::Gmail.
6.13.3. PHP
G M ailer or libgmailer (http://gmail- lite.s ourc eforge.net; G N U
P ublic L ic ens e) is a P H P library for interac ting with G mail by way
of the c url library (http://www.php.net/c url) with SSL s upport
(http://www.opens s l.org). I t is the engine underlying gmail- lite
Hack #76, an H T M L- only interfac e to G mail.
For full libgmailer doc umentation and plenty of s ample c ode, leaf
through the online doc umentation (http://gmail-
lite.s ourc eforge.net/doc s .html).
6.13.4. Java
G 4 j or G M ail A P I for J ava (http://g4 j.s ourc eforge.net; G N U
P ublic L ic ens e) is a J ava interfac e to G mail. T he A P I c omes with
G M ailer for J ava, a bas ic G U I frontend to G mail built on top of
G 4 j.
Full doc umentation in J avadoc H T M L is available online
(http://g4 j.s ourc eforge.net/doc ).
6.13.5. .NET
T he G mail A gent A P I (http://johnvey.c om/features /gmailapi;
G N U P ublic L ic ens e) is a .N E T foundation for programming to
G mail. A full pac kage of s ourc e for the A P I its elf; the G mail
A gent A pplet, a proof of c onc ept Windows frontend to G mail; and
as s oc iated Windows I ns taller projec ts are available for
download.
T here's als o full doc umentation available in both H T M L format
(http://johnvey.c om/features /gmailapi/doc s ) and as Windows
H elp.
Chapter 7. Ads
H ac ks 8 1 - 8 5
Sec tion 7 .2 . G oogle A dSens e
Sec tion 7 .3 . G oogle A dWords
H ac k 8 1 . G et the M os t out of A dWords
H ac k 8 2 . G enerate G oogle A dWords
H ac k 8 3 . Sc rape G oogle A dWords
H ac k 8 4 . D etermine the Worth of A dWords Words
H ac k 8 5 . Serve Bac kup A ds
Hacks 81-85
You've probably notic ed G oogle's advertis ingo r perhaps you
haven't. But it's there, on the periphery of every G oogle res ults
page. T hen again, no one c an blame you for overlooking the ads ,
s inc e they're s mall, text- only, and rather unobtrus ive.
N onetheles s , they're effec tive. I t turns out that hiding
s traightforward, c ontent- like ads in plain s ight is rather a relief
in today's Web of flas hy (and, indeed, flas hing) in- your- fac e
advertis ing. V is itors have learned to tune out traditional
billboard- like ads and hone in on textual c ontenta nd G oogle's
A dWords .
But s imply grabbing eyeballs is n't quite enough. I t's the click-
throughc lic king an ad and following it to the advertis er and its
produc ts t hat c ounts . T his is where G oogle's A dWords really
s hine. T hey're not s imply rotating, flip- of- the- c oin ads ; they're
every bit as relevant as the res ults of your s earc h are. Q uery
G oogle for " volvo safety" and alongs ide the Volvo s afety reports
and c ras h tes ts you'll s ee C ar Safety ads from C A RFA X
(http://www.c arfax.c om) and Volvo A uc tions from
C H E A P C arFinder.c om (http://www.c heapc arfinder.c om). Try
pirates and you'll be s erved (at leas t at the time of this writing) a
Walmart ad. What's Walmart got to do with pirates , you as k? N ot
muc h, it s eems , but they purc has ed the A dWord and mus t have
had s ome reas on for doing s o. C lic k the link and Walmart's
produc t s earc h turns up a G ameboy game, as well as V H S and
D V D vers ions of D is ney's Pirates of the Caribbean. I f G oogle has
nothing relevant to s how, it'll s how no ads at all.
7.2. Google AdSense
A dSens e© (http://www.google.c om/ads ens e) is G oogle's
advertis ing s ervic e, intended to deliver jus t s uc h advertis ing
magic to your web s ite. With more than 1 5 0 ,0 0 0 advertis ers
s igned up, there are s ure to be ads targeting your readers , be
your s ite about bas eball, c omputers , or rare s poon c ollec ting.
7.3. Google AdWords
O n the flips ide of the A dSens e c oin, you'll find G oogle A dWords
(http://adwords .google.c om), the fount from whic h thos e ads are
drawn. A dWords is an advertis er bas e 1 5 0 ,0 0 0 s trong, from
mom- and- pops to Fortune 5 0 0 s , all looking to make their
pres enc e known and wares available alongs ide G oogle s earc h
res ults .
A nd in true G oogle s tyle, A dWords is different from jus t about
every advertis ing s ervic e you've ever s een. T here's virtually no
pric e barrier; anyone with a few marketing dollars in their poc ket
c an buy a few keywords . I t's s o s imple that even the mos t
inexperienc ed marketer c an get a leg up. T hat s aid, there's a lot
to A dWords , and its s implic ity c an be dec eptive.
T his book, being foc us ed on c ool hac ks ,
tools , and tec hniques , does n't attempt a
c omprehens ive introduc tion to G oogle's
advertis ing programs . For a c omprehens ive
introduc tion and detailed treatment, pic k up
a c opy of Google: The Mis s ing Manual
(http://www.oreilly.c om/c atalog/googletmm;
O 'Reilly) by Sarah M ils tein and Rael
D ornfes t.
Hack 81. Get the Most out of AdWords
Guest commentary by A ndrew Goodman of Traf f ick on how to
write great A dWords.
A dWords (https ://adwords .google.c om) is jus t about the s ort of
advertis ing program that you might expec t to roll out of the big
brains at G oogle. T he des igners of the advertis ing s ys tem have
innovated thoroughly to provide prec is e targeting at low c os t
with les s worki t really is a good deal. T he flips ide is that it takes
a fair bit of s avvy to get a c ampaign to the point where it s tops
failing and s tarts working.
For larger advertis ers , A dWords Selec t is a no- brainer. Within a
c ouple of weeks , a larger advertis er will have enough data to
dec ide whether to s ignific antly expand their ad program on
A dWords Selec t or perhaps to upgrade to a premium s pons or
ac c ount.
I 'm going to as s ume that you have a bas ic familiarity with how
c os t- per- c lic k advertis ing works . A dWords Selec t ads c urrently
appear next to s earc h res ults on G oogle.c om (and s ome
international vers ions of the s earc h engine) and near s earc h
res ults on A O L and a few other major s earc h des tinations . T here
are a great many quirks and foibles to this form of advertis ing.
M y foc us here will be on s ome tec hniques that c an turn a
medioc re, nonperforming c ampaign into one that ac tually makes
money for the advertis er while c onforming to G oogle's rules and
guidelines .
O ne thing I s hould make c rys tal c lear is that advertis ing with
G oogle bears no relations hip to having your web s ite's pages
indexed in G oogle's s earc h engine. T he s earc h engine remains
totally independent of the advertis ing program. A d res ults never
appear within s earc h res ults .
I 'm going to offer four key tips for maximizing A dWords Selec t
c ampaign performanc e, but before I do, I 'll s tart with four bas ic
as s umptions :
H igh c lic k- through rates (C T Rs ) s ave you money, s o
that s hould be one of your main goals as an A dWords
Selec t advertis er. G oogle has s et up the keyword bidding
s ys tem to reward high- C T R advertis ers . Why? I t's
s imple. I f 2 ads are eac h s hown 1 0 0 times , the ad that
is c lic ked on 8 times generates revenue for G oogle twic e
as often as the ad that is c lic ked on 4 times over the
s ame s tretc h of 1 0 0 s earc h queries s erved. So if your
C T R is 4 % and your c ompetitor's is only 2 % , G oogle
fac tors this into your bid. Your bid is c alc ulated as if it
were "worth" twic e as muc h as your c ompetitor's bid.
Very low C T Rs are bad. G oogle dis ables keywords that
fall below a minimum C T R thres hold ("0 .5 % normalized
to ad pos ition," whic h is to s ay, 0 .5 % for pos ition 1 , and
a more forgiving thres hold for ads as they fall further
down the page). E ntire c ampaigns will be gradually
dis abled if they fall below 0 .5 % C T R on the whole.
E ditorial dis approvals are a fac t of life in this venue.
Your ad c opy or keyword s elec tions may violate
G oogle's editorial guidelines from time to time. A gain,
it's very diffic ult to run a s uc c es s ful c ampaign when
large parts of it are dis abled. You need to treat this as a
normal part of the proc es s rather than giving up or
getting flus tered.
T he A dWords Selec t s ys tem is s et up like an advertis ing
laboratory; that is to s ay, it makes experimenting with
keyword variations and s mall variations in ad c opy a
s nap. N o guru c an prejudge for you what will be your
"magic al ad c opy s ec rets ," and it would be irres pons ible
to do s o, bec aus e G oogle offers s uc h detailed real- time
reporting that c an tell you very quic kly what does and
does not c atc h people's attention.
N ow on to four tips to get thos e C T Rs up and to keep your
c ampaign from s traying out of bounds .
7.4.1. Matching Can Make a Dramatic
Difference
You'll likely want to organize your c ampaign's keywords and
phras es into s everal dis tinc t ad groups (made eas y by G oogle's
interfac e). T his will help you more c los ely matc h keywords to the
ac tual words that appear in the title of your ad. Writing s lightly
different ads to c los ely c orres pond to the words in eac h group of
keywords that you've put together is a great way to improve your
c lic k- through rates . You'd think that an ad title (s ay, "D eluxe
Tops oil in Bulk") would matc h equally well to a range of keywords
that mean es s entially the s ame thing. T hat is , you'd think this
ad title would c reate about the s ame C T R with the phras e "bulk
tops oil" as it would with a s imilar phras e (e.g., "fanc y dirt
wholes aler"). N ot s o. E xac t matc hes tend to get s ignific antly
higher C T Rs . Being diligent about matc hing your keywords
reas onably c los ely to your ad titles will help you outperform your
les s diligent c ompetition.
I f you have s everal s pec ific produc t lines , you s hould c ons ider
better matc hing different groups of key phras es to an ad written
expres s ly for eac h produc t line. I f your c lients like your s tore
bec aus e you offer c ertain s pec ialized wine varieties , for
example, have an ad group with "ic e wine" and related keywords
in it, with "ic e wine" in the ad title. D on't expec t the s ame
generic ad to c over all your varieties . Someone s earc hing for an
"ic e wine" expert will be thrilled to find a retailer who s pec ializes
in this area. T hey probably won't c lic k on or buy from a retailer
who jus t talks about wine in general. Searc h engine us ers are
pas s ionate about partic ulars , and their queries are highly
granular. Take advantage of this pas s ion and granularity.
T he other benefit of getting more granular and matc hing
keywords to ad c opy is that you don't pay for c lic ks from
unqualified buyers , s o your s ales c onvers ion rate is likely to be
muc h higher.
7.4.2. Copywriting Tweaks Generally
Improve Clarity and Directness
By and large, I don't run ac ros s major c opywriting s ec rets .
P s yc hologic al tric ks to entic e more people to c lic k, after all,
may wind up attrac ting unqualified buyers . But there are times
when the text of an ad falls outs ide the zone of "what works
reas onably well." I n s uc h c as es , exc es s ively low C T Rs kill any
c hanc e your web s ite might have had to c los e the s ale.
C ons ider us ing the G oldiloc ks method to diagnos e poor-
performing ads . M any ads lean too far to the "too c old" s ide of
the equation. O verly tec hnic al jargon may be unintelligible and
uninteres ting even to s pec ialis ts , es pec ially given that this is
s till an emotional medium and that people are looking at s earc h
res ults firs t and glanc ing at ad res ults as a s ec ond thought.
T he following example is too c old:
Faster DWMGT Apps
Build GMUI modules 3X more secure than KLT. V. 2.0 rated as
"best pligtonferg" by WRSS Mag.
N o one c lic ks . C ampaign limps along. Web s ite remains world's
bes t- kept s ec ret.
So then a hots hot (the owner's nephew) grabs the reins and tries
to put s ome juic e into this thing. U nfortunately, this new c reative
genius has been awake for the better part of a week, attending
raves , plac ing s ec ond in a s nowboarding c ompetition, and
tending to his various pierc ings . H is agenc y work for a major
Fortune 5 0 0 c lient's televis ion s pots onc e rec eived rave
reviews . O f c ours e, thos e were rave reviews from indus try
pundits and his bes t friends , bec aus e the ac tual RO I on the big
c lient's T V branding c ampaign was untrac kable.
T he hots hot's c opy reads :
Reemar's App Kicks!
Reemar ProblemSolver 2.0 is the real slim shady. Don't trust
your Corporate security to the drones at BigCorp.
U nfortunately, in a nonvis ual medium with only a few words to
work with, the true genius of this ad is never fully apprec iated.
V iewers don't c lic k and may be offended by the ad and annoyed
with G oogle.
T he s imple s olution is s omething unglamorous but c lear, s uc h
as :
Easy & Powerful Firewall
Reemar ProblemSolver 2.0 outperforms BigCorp
Exacerbator 3 to 1 in industry tests.
You c an't s ay it all in a s hort ad. T his gets enough s pec ific (and
true) info out there to be of interes t to the target audienc e. O nc e
they c lic k, there will be more than enough info on your web s ite.
I n s hort, your ads s hould be c lear. H ow's that for a major
c opywriting revelation?
T he nic e thing is , if you're bent on finding out for yours elf, you
c an tes t the performanc e of all three s tyles quic kly and c heaply,
s o you don't have to s pend all week agonizing about this .
7.4.3. Be Inquisitive and Proactive with
Editorial Policies (But Don't Whine)
E ditorial overs ight is a big tas k for G oogle A dWords s taffa tas k
that often gets them in hot water with advertis ers , who don't like
to be reined in. For the mos t part, the rules are in the long- term
bes t interes t of this advertis ing medium, bec aus e they're aimed
at maintaining c ons umer c onfidenc e in the quality of what
appears on the page when that c ons umer types s omething into a
s earc h engine. H uman error, however, may mean that your
c ampaign is being treated unfairly bec aus e of a
mis unders tanding. O r maybe a rule is ambiguous and you jus t
don't unders tand it.
Reply to the editorial dis approval mes s ages (they generally
c ome from adwords - s upport@ google.c om). A s k ques tions until
you are s atis fied that the rule makes s ens e as it applies to your
bus ines s . T he more G oogle knows about your bus ines s , in turn,
the more they c an work with you to help you improve your
res ults , s o don't hes itate to give a bit of brief bac kground in your
notes to them. T he main thing is , don't let your c ampaign jus t s it
there dis abled bec aus e you're c onfus ed or angry about being
dis approved. M ake needed c hanges , make the appropriate polite
inquiries , and move on.
7.4.4. Avoid the Trap of "Insider Thinking"
and Pursue the Advantage of Granular
Thinking
U s ing lis ts of s pec ialized keywords will likely help you to reac h
interes ted c ons umers at a lower c os t per c lic k and c onvert more
s ales than us ing more general indus try keywords . Running your
ad on keywords from s pec ialized voc abularies is a s ound
s trategy.
A les s s uc c es s ful s trategy, though, is to get los t in your own
highly s pec ialized s oc ial s tratum when c ons idering how to pitc h
your c ompany. Remember that this medium revolves around
c ons umer s earc h engine behavior. You won't win new c us tomers
by generating a lis t of different ways of s tating terminology that
only management, c ompetitors , or partners might ac tually us e,
unles s your ad c ampaign is jus t being run for vanity's s ake.
Break things down into granular piec es and us e indus try jargon
where it might attrac t a target c ons umer, but when you find
yours elf lis ting phras es that only your c ompetitors might know
or buzzwords that c ame up at the las t interminable management
meeting, s top! You've s tarted down the path of ins ider thinking!
By doing s o, you may have forgotten about the c us tomer and
about the role market res earc h mus t play in this type of
c ampaign.
I t s ounds s imple to s ay it, but in your A dWords Selec t keyword
s elec tion, you aren't des c ribing your bus ines s . You're trying to
us e phras es that c ons umers would us e when trying to des c ribe a
problem they're having, a s pec ific item they're s earc hing for, or a
topic that they're interes ted in. M is s ion s tatements from above
vers us what c us tomers and pros pec ts ac tually type into s earc h
engines . Big differenc e. (A t this point, if you haven't yet done s o,
you'd better go bac k and read over The Cluetrain Manifes to to get
yours elf right out of this top- down mode of thinking.)
O ne way to find out about what c ons umers are looking for is to
us e Wordtrac ker (http://www.wordtrac ker.c om) or other keyword
res earc h tools (s uc h as the one that G oogle offers as part of the
A dWords Selec t interfac e, a keyword res earc h tool that G oogle
promis es it's working on). H owever, thes e tools are not in
thems elves enough for every bus ines s ; bec aus e more
bus ines s es are us ing thes e keyphras e s earc h frequenc y reports ,
the frequently s earc hed terms eventually bec ome pic ked over by
c ompeting advertis ers jus t what you want to avoid if you're trying
to s neak along with good res pons e rates at a low c os t per c lic k.
You'll need to brains torm as well. I n the future, there will be more
s ophis tic ated s oftware- driven market res earc h available in this
area. Searc h tec hnology c ompanies s uc h as A s k J eeves
E nterpris e Solutions are already c ollec ting data about the
hundreds of thous ands of c us tomer ques tions typed into the
s earc h boxes on major c orporate s ites , for example. T his kind of
market res earc h is under- us ed by the vas t majority of
c ompanies today.
T here are c urrently many low- c os t opportunities for pay- per-
c lic k advertis ers . A s more and larger advertis ers enter the
s pac e, pric es will ris e, but with a bit of c reativity, granular
thinking, and diligent tes ting, the s maller advertis er will always
have a fighting c hanc e on A dWords Selec t. G ood luc k!
Andrew Goodman
Hack 82. Generate Google AdWords
You've written the copy and you've planned the budget. Now,
what keywords are you going to use f or your ad?
You've read about it and you've thought about it and you're
ready to buy one of G oogle's A dWords . You've even got your
c opy together and you feel pretty c onfident about it. You've only
got one problem now: figuring out your keywords , the s earc h
words that will trigger your A dWord to appear.
You're probably buying into the A dWords program on a budget,
and you definitely want to make every penny c ount. C hoos ing
the right keywords means that your ad will have a higher c lic k-
through rate. T hankfully, the G oogle A dWords program allows
you to do a lot of tweaking, s o if your firs t c hoic es don't work,
experiment, tes t, and tes t s ome more!
7.5.1. Choosing AdWords
So where do you get the s earc h keywords for your ad? T here are
four plac es that might help you find them:
Log files
E xamine your s ite's log files . H ow are people finding
your s ite now? What words are they us ing? What s earc h
engines are they us ing? A re the words they're us ing too
general to be us ed for A dWords ? I f you look at your log
files , you c an get an idea of how people who are
interes ted in your c ontent are finding your s ite. (I f they
weren't interes ted in your c ontent, why would they
vis it? )
Examine your own s ite
I f you have an internal s earc h engine, c hec k its logs .
What are people s earc hing for onc e they get to your
s ite? A re there any c ommon mis s pellings that you c ould
us e as an A dWord? A re there any c ommon phras es that
you c ould us e?
Brains torm
What do people think of when they look at your s ite?
What keywords do you want them to think of? Brains torm
about the produc t that's mos t c los ely as s oc iated with
your s ite. What words c ome up?
I magine s omeone goes to a s tore and as ks about your
produc ts . H ow are they going to as k? What words would
they us e? C ons ider all the different ways s omeone c ould
look for or as k about your produc t or s ervic e, and then
c ons ider if there's a s et of words or a phras e that pops
up over and over again.
Glos s aries
I f you've brains tormed until wax dribbles out your ears
but you're no c los er to c oming up with words relevant to
your s ite or produc t, vis it s ome online glos s aries to jog
your brain. T he G los s aris t (http://www.glos s aris t.c om)
links to hundreds of glos s aries on hundreds of different
s ubjec ts . C hec k and s ee if they have a glos s ary relevant
to your produc t or s ervic e, and s ee if you c an pull s ome
words from there.
7.5.2. Exploring Your Competitors'
AdWords
O nc e you've got a reas onable lis t of potential keywords for your
ad, take them and run them in the G oogle s earc h engine. G oogle
rotates advertis ements bas ed on the s pending c ap for eac h
c ampaign, s o even after running a s earc h three or four times ,
you may s ee different advertis ements eac h time. U s e the
A dWords s c raper to s ave thes e ads to a file and review them
later.
I f you find a potential keyword that apparently c ontains no
advertis ements , make a note. When you're ready to buy an
A dWord, you'll have to c hec k its frequenc y; it might not be
s earc hed often enough to be a luc rative keyword for you. But if it
is , you've found a potential advertis ing s pot with no other ads
c ompeting for s earc hers ' attentions .
Hack 83. Scrape Google AdWords
Scrape the A dWords f rom a saved Google results page into a
f orm suitable f or importing into a spreadsheet or database.
G oogle's A dWords t he text ads that appear to the right of the
regular s earc h res ults a re delivered on a c os t- per- c lic k bas is ,
and purc has ers of the A dWords are allowed to s et a c eiling on
the amount of money that they s pend on their ad. T his means
that, even if you run a s earc h for the s ame query word multiple
times , you won't nec es s arily get the s ame s et of ads eac h time.
I f you're c ons idering us ing G oogle A dWords to run ads , you
might want to gather up and s ave the ads that are running for the
query words that interes t you. G oogle A dWords is not inc luded in
the func tionality provided by the G oogle A P I , s o you're left to a
little s c raping to get at that data.
Be s ure to read "A N ote on Spidering and
Sc raping" in C hapter 9 for s ome
unders tanding of what s c raping means .
T his hac k will let you s c rape the A dWords from a s aved G oogle
res ults page and export them to a c omma- s eparated (C SV ) file,
whic h you c an then import into E xc el or your favorite
s preads heet program.
T his hac k requires a P erl module c alled
HTML::TokePars er
(http://s earc h.c pan.org/s earc h?
query=htmL % 3 A % 3 A tokepars er&mode=all).
You'll need to ins tall it before the hac k will
run.
7.6.1. The Code
Save this c ode to a text file named adwords .pl:
#!/usr/bin/perl
# usage: perl adwords.pl results.html
use strict;
use HTML::TokeParser;
die "I need at least one file: $!\n"
unless @ARGV;
my @Ads;
for my $file (@ARGV){
# skip if the file doesn't exist
# you could add more file testing here.
# errors go to STDERR so they won't
# pollute our csv file
unless (-e $file) {
warn "What??: $file -- $! \n-- skipping --\n";
next;
}
# now parse the file
my $p = HTML::TokeParser->new($file);
while(my $token = $p->get_token) {
next unless $token->[0] eq 'S'
and $token->[1] eq 'a'
and $token->[2]{id} =~ /^aw\d$/;
my $link = $token->[2]{href};
my $ad;
if($link =~ /pagead/) {
my($url) = $link =~ /adurl=([^\&]+)/;
$ad->{href} = $url;
} elsif($link =~ m{^/url\?}) {
my($url) = $link =~ /\&q=([^&]+)/;
$url =~ s/%3F/\?/;
$url =~ s/%3D/=/g;
$url =~ s/%25/%/g;
$ad->{href} = $url;
}
$ad->{adwords} = $p->get_trimmed_text('/a');
$ad->{desc} = $p->get_trimmed_text('/font');
($ad->{url}) = $ad->{desc} =~ /([\S]+)$/;
push(@Ads,$ad);
print quoted( qw( AdWords HREF Description URL Interest ) );
for my $ad (@Ads) {
print quoted( @$ad{qw( adwords href desc url )} );
sub quoted {
return join( ",", map { "\"$_\"" } @_ )."\n";
}
7.6.2. How It Works
C all this s c ript on the c ommand line ["H ow to Run the H ac ks " in
the P refac e], providing the name of the s aved G oogle res ults
page and a file in whic h to put the C SV res ults :
% perl adwords.pl input.html > output.csv
input.html is the name of the G oogle res ults page that you've
s aved. output.cs v is the name of the c omma- delimited file to
whic h you want to s ave your res ults . You c an als o provide
multiple input files on the c ommand line if you'd like:
% perl adwords.pl input.html input2.html > output.csv
7.6.3. The Results
T he res ults will appear in a c omma- delimited format that looks
like this :
"AdWords","HREF","Description","URL","Interest"
"Free Blogging Site","http://www.1sound.com/ix",
" The ultimate blog spot Start your journal now ","www.1sound.com/
"New Webaga Blog","http://www.webaga.com/blog.php",
" Fully customizable. Fairly inexpensive. ","www.webaga.com","24"
"Blog this","http://edebates.e-thepeople.org/a-national/article/10
" Will online diarists rule the Net strewn with failed dotcoms? ",
"e-thePeople.org","26"
"Ford - Ford Cars","http://quickquote.forddirect.com/FordDirect.js
" Build a Ford online here and get a price quote from your local d
"www.forddirect.com","40"
"See Ford Dealer's Invoice","http://buyingadvice.com/search/",
" Save $1,400 in hidden dealership profits on your next new car. "
"buyingadvice.com","28"
"New Ford Dealer Prices","http://www.pricequotes.com/",
" Compare Low Price Quotes on a New Ford from Local Dealers and Sa
"www.pricequotes.com","25"
E ac h line is prematurely broken in this
c ode lis ting for the purpos es of
public ation.
You'll s ee that the hac k returns the A dWords headline, the link
U RL , the des c ription in the ad, the U RL on the ad (this is the
U RL that appears in the ad text, while the H RE F is what the U RL
links to), and the I nteres t, whic h is the s ize of the I nteres t bar
on the text ad. T he I nteres t bar gives an idea of how many c lic k-
throughs an ad has had, s howing how popular it is .
Tim Allwine and Tara Calis hain
Hack 84. Determine the Worth of
AdWords Words
Harness the Google A dWords marketplace to guesstimate the
value of a keyword or phrase in online advertising.
G oogle A dWords c reate what c an only be termed an advertis ing
marketplac e.
You c an c hoos e to pay more to ac quire more prominenc e on a
res ults page, or pay les s and s till s ee your fair s hare of eyeballs
and c lic ks until your daily budget is s pent. By offering to pay a
little more for eac h c lic k- through, you'll move yours elf up a s pot
or two in that lis t of 8 to 1 0 Spons ored L inks on G oogle res ults
pages .
Surely, then, you c ould jus t be s ure to pay enough to keep your
ad in the top s pot? N ot s o, ac tually.
I n addition to the c os t- per- c lic k (C P C ) you're prepared to pay,
A dWords pays c los e attention to the c lic k- through- rate (C T R),
the proportion of how many times your ad is s hown vers us how
many times people ac tually c lic k on it. By c arefully balanc ing
the pric e and effec tivenes s of ads , G oogle makes s ure the top
s pots belong to the mos t relevant and targeted ads , not jus t
thos e with the deepes t poc kets .
N ow, this might appear at firs t blus h to be a game of poker, and
to a c ertain extent it is . T hat s aid, the A dWords Traffic E s timator
(s ee Figure 7 - 7 , later in this hac k) is like a s ilent partner,
advis ing you on your bets (s orry, bids ) and gues timating your
average pos ition (1 .0 is the top s pot, 2 .0 is the s ec ond, and s o
forth) in that lis t of ads .
T he A dWords Traffic E s timator is an inc redibly us eful tool for
experienc ed online advertis ers and newbies alike.
A ndo f partic ular interes t to us hac ker types i t has the nic e s ide
effec t of harnes s ing the G oogle A dWords marketplac e to plac e a
very real (albeit es timated) pric e on the relative value of
individual words or phras es . I t is on this alternate us e of the
es timator that this hac k is foc us ed.
We'll do this two ways : by handc lic king through the A dWords
s ite, filling in forms , c opying, and pas tinga nd then
programmatic allyt urning keywords or phras es into a c omma-
s eparated (C SV ) file s uitable for import into jus t about any
s preads heet or databas e applic ation that you might be running.
Why do this manually when you c an have
your c omputer do all the work for you?
Firs t, the whole A dWords proc es s is rather
well done and educ ative. Sec ond, the
s c ript automates what is s uppos ed to be
the ac tions of a human on his way to
s igning up for A dWords and s o employs a
s et of hac ks and s c rapes and, as s uc h, is
brittle and c ould well not work by the time
you read this .
7.7.1. By Hand
I t's quite a journey from the A dWords home page to the A dWords
Traffic G enerator, s o we'll walk you through how to get there with
the bare minimum of work.
P oint your Web brows er at the home page
(http://adwords .google.c om) and c lic k the "C lic k to begin"
button (Figure 7 - 1 ).
Figure 7-1. Click to begin
U nles s you're ac tually interes ted in s pec ifying language or
loc ation targeting, go ahead and s kip on pas t the "C hoos e your
language and loc ation targeting" page, s hown in Figure 7 - 2 , by
c lic king the Save & C ontinue button.
Figure 7-2. Choose your language and geo-
targeting
You do need to pic k a c ountry or s et of c ountries on the page
s hown in Figure 7 - 3 . A gain, unles s you're ac tually interes ted in
c hoos ing s pec ific c ountries , jus t c lic k the A dd button to s elec t
A ll C ountries and the Save & C ontinue button to move on.
Figure 7-3. Choose your countries
O n the "C reate ads " page s hown in Figure 7 - 4 , you need to
c reate a plac eholder ad. Fill in the form and c lic k the C ontinue
button when you're done.
A s an as ide, notic e that I 've s pelled
G oogle with two too many "o"s : G oooogle.
I t turns outfor perfec tly unders tandable
reas ons t hat only G oogle c an us e the word
G oogle (or G ooogle) in their A dWords
program.
Figure 7-4. Create an ad, any ad
N ow for the interes ting part: c hoos ing keywords or phras es to
evaluate. Type in s ome number of keywords or phras es into the
box s hown in Figure 7 - 5 and c lic k the Save Keywords button.
Figure 7-5. Choose your keywords or phrases
A dWords drops your keywords into a pretty table, s hown in
Figure 7 - 6 , and gives you the c hanc e to s pec ify your preferred
c urrenc y and how muc h you'd pay for a s ingle c lic k. A djus t thes e
if you wis h and, when you're ready, c lic k the C alc ulate E s timates
button.
Figure 7-6. Calculate estimates
Finally, we get to the payoff, s hown in Figure 7 - 7 : the table is
filled in with reas onable gues s timates of average C P C , c os t per
day if you adjus t for the number of c lic ks per day s hown, and
expec ted average pos ition of your ad in the lis t of Spons ored
L inks on a res ults page in whic h your ad appears .
Figure 7-7. Average Cost-Per-Click (CPC)
For our purpos e at hand, it's the average C P C numbers that we
were aftera nd here you have them.
A t any time, you c an c hange the c urrenc y and maximum c os t
you'd be willing to c ons ider paying per c lic k. C lic k the
Rec alc ulate E s timates button to update the table. You c an als o
c hange keywords by c lic king the C hange Keywords button and
altering the keywords and phras es that you entered in Figure 7 -
5.
You'll find another handy tool behind the "find alternatives " links
as s oc iated with eac h keyword. A s s hown in Figure 7 - 8 , A dWords
is remarkably good at finding related keywords and phras es for
you. Selec t any number by c lic king their c hec kboxes and c lic k
the "A d thes e keywords " button at the bottom of the s c reen.
O therwis e, c lic k the C anc el button to leave things as they are.
Figure 7-8. Find alternative keywords
N ow that you have your keywords or phras es and their
res pec tive worth in tabular format, you c an s imply highlight them
with your mous e, as s hown in Figure 7 - 9 , and c opy them as you
would any other text (C ontrol- C on Windows , C ommand- C on
M ac intos h).
Figure 7-9. Copy estimated Cost-Per-Clicks
P as te what you've c opied into a text file or s preads heet (Figure
7 - 1 0 ) and rearrange and c lean up as you s ee fit. A t this point
you have data you c an work with, s ave as a C SV and import into
your preferred databas e for further analys is , or pas te into an
email mes s age as a gentle nudge to your marketing department.
Figure 7-10. Paste estimates into Excel
7.7.2. Programmatically
D oing things by hand is c ertainly good enough if you only do s o
onc e in a while. L et's s ay, however, that you want to keep an eye
on the relative c os ts of A dWords words that you're interes ted in
buying into at s ome pointmuc h as you would monitor individual
s toc ks by building a portfolio and c hec king in on a regular bas is
or having your online brokerage email you alerts . G oogle, at
leas t at the time of this writing, offers no s uc h s ervic e: the foc us
of their interfac e is on the ads you're building right now and
thos e you're running and maintaining on an ongoing bas is .
So, let's build s uc h a s ervic e on our own.
7.7.2.1 The code
T his s c ript mimic s the ac tivity of s omeone manually going
through the A dWords s ite, filling in forms , c lic king buttons , and
eventually c opying C P C es timates and pas ting them into to the
s c reen or a C SV file. I n other words , the c ode does jus t what we
did by hand a s c ant moment ago.
You'll need to pic k up and ins tall a few prerequis ite P e
way: Crypt::SSLeay for talking to the G oogle A dWords
c hannel (required by A dWords ), WWW::Mechanize for au
interac tion with the G oogle A dWords s ite, HTML::Table
gleaning res ults from H T M L tables , and Text::CSV for s
C SV format. (See [Hack #92] for guidanc e on ins tallin
T he only one of thes e prerequis ites that might c aus e y
is Crypt::SSLeay on Windows . A c tiveState, makers of A
Windows , does not (at leas t not at the time of this writ
to dis tribute the module as a P P M due to C anadian law
c ryptographic s oftware. C hec k their "Status of the A c t
Repos itories " page
(http://as pn.ac tives tate.c om/A SP N /D ownloads /A c tive
for details and alternate ins tallation ins truc tions .
Save the following c ode as adwords _worth.pl:
#!/usr/bin/perl -w
# adwords_worth.pl
# Automate gleaning Google AdWords estimated cost-per-clicks (CPCs
# Usage: perl adwords_worth.pl <keyword1> <keyword2> [..]
# perl adwords_worth.pl < keywords.txt
use strict;
use WWW::Mechanize;
use HTML::TableContentParser;
use Text::CSV;
=head1 NAME
adwords_worth - Returns estimated Google AdWords cost-per-clicks (
of provided keywords in comma-separated (CSV) format.
=cut
# Fill up keywords.
my $keyword_string;
if( not @ARGV ) {
# You piped in a file
local $/ = undef;
$keyword_string = <STDIN>;
} else {
# Keywords are specified on command line
$keyword_string = join( "\n", @ARGV );
$keyword_string =~ s/\s+/\n/g;
die "No keywords specified!" unless $keyword_string =~ /\w+/;
=head1 SYNOPSIS
adwords_worth.pl keyword1 [..]
adwords_worth.pl < keywords.txt
=cut
# Set up WWW::Mechanize to die on errors.
my $agent = WWW::Mechanize->new( autocheck => 1 );
# Get initial page.
print STDERR "Fetching the Adwords initial page... ";
$agent->get('https://adwords.google.com/');
$agent->form_number(3);
$agent->click('start');
print STDERR "ok\n";
print STDERR "Visiting the Language and Targeting page... ";
# On Language and Targeting page.
# Defaults are okay for now.
$agent->click('save');
print STDERR "ok\n";
# On country selector.
# Right now default value is "All Countries".
print STDERR "Visiting the Country selector... ";
$agent->click('save');
print STDERR "ok\n";
# On Create Ad page.
# Fill in placeholder values, since it doesn't matter.
# CAVEAT: All creative lines must be spelled correctly.
# See: Adwords editorial guidelines.
print STDERR "Creating a placeholder ad... ";
$agent->current_form->value( 'adGroupName', 'groupname' );
$agent->current_form->value( 'creative.line1', 'Spelling' );
$agent->current_form->value( 'creative.line2', 'Spelling' );
$agent->current_form->value( 'creative.line3', 'Spelling' );
$agent->current_form->value( 'creative.visibleUrl', 'a.com' );
$agent->current_form->value( 'creative.destUrl', 'a.com' );
$agent->click('save');
print STDERR "ok\n";
# On Keywords page.
print STDERR "Plugging in your keywords... ";
$agent->current_form->value( 'keywords', $keyword_string );
$agent->click('save');
print STDERR "ok\n";
# On Price Table Page, but no values are in the table.
print STDERR "Recalculating keyword values... ";
$agent->click('recalculate');
print STDERR "ok\n";
# Now on Price Table page, and the table now has values.
print STDERR "Gleaning keyword values and building you a CSV...\n\
my $p = HTML::TableContentParser->new( );
my $tables = $p->parse( $agent->content( ) );
# Table with the values is has its class attribute set to report.
my @report_tables =
grep { exists $_->{class} and $_->{class} eq 'report' } @$tables
# Assuming that Google on has on report table per page.
my $table = $report_tables[0];
# Make CSV object out here instead of having loop make X of them.
my $csv = Text::CSV->new( );
# Get the rows of cells out of $table's convoluted structure.
# TODO naming of variables here is odd, but check out Dumper(\@row
my @row_cell_objs = grep { $_->{cells} } @{ $table->{rows} };
my @data_cells = map { $_->{cells} } @row_cell_objs;
foreach my $row ( $table->{headers}, @data_cells ) {
# Just being safe here with references.
if( ref $row eq 'ARRAY' ) {
# Eliminate title cell and cells in rightmost column.
# They contain only links.
my @table_cells =
grep { not exists $_->{class} or $_->{class} ne 'rightco
my @data = map { $_->{data} } @table_cells;
foreach (@data) {
# Remove HTML tags and surrounding whitespace.
s/<[^>]*>//g;
s/^\s+//;
s/\s+$//;
s/\</</g;
# Number of clicks contains commas, but we don't want
tr/,//d;
}
# Make a CSV line and print it.
if( $csv->combine(@data) ) {
print $csv->string, "\n";
} else {
my $err = $csv->error_input;
print "combine( ) failed on argument: ", $err, "\n";
} else {
print "Row is not an array of cells!\n";
print STDERR "Done.\n";
=head1 AUTHOR
=cut
7.7.2.2 Running the hack
You c an invoke the s c ript in two ways . T he firs t is to pas s
keywords or phras es on the c ommand line ["H ow to Run the
H ac ks " in the P refac e] to be pas s ed onto and pric ed by
A dWords :
$ perl adwords_worth.pl adword "another adword" "one more"
Be s ure to wrap phras es in quotes ;
otherwis e, they'll be s een as individual
words .
T he s ec ond way is to maintain a text file of keywords and
phras es , one per line (without any enc los ing quotes ), and feed
that file to the s c ript:
$ perl adwords_worth.pl < adwords.txt
Both of thes e invoc ations produc e output as C SV, printing them
to the s c reen. To c apture them in a .cs v file, redirec t the output
like s o:
$ perl adwords_worth.pl < adwords.txt > adwords.csv
T he s c ript als o keeps you appris ed of jus t where it is in the
proc es s with s tatus mes s ages printed to the s c reenl es t you
think it's gone off into wildernes s , never to return.
7.7.2.3 The results
H ere's a s ample run us ing the s ame keywords /phras es us ed in
the "By H and" walkthrough.
T he es timated worth may well have
c hanged between the vers ions a nd, indeed,
any two invoc ations of the s c ript. We told
you A dWords was a real market; and if it
weren't, this little experiment wouldn't be
anywhere near as interes ting.
$ perl adwords_worth.pl "George Bush" "John Kerry" "Ralph Nader" "
Fetching the Adwords initial page... ok
Visiting the Language and Targeting page... ok
Visiting the Country selector... ok
Creating a placeholder ad... ok
Plugging in your keywords... ok
Recalculating keyword values... ok
Gleaning keyword values and building you a CSV...
"Keyword","Clicks /Day","AverageCost-Per-Click","Cost /Day","Avera
"George Bush","910.0","$0.21","$189.88","1.3"
"John Kerry","810.0","$0.18","$138.36","1.4"
"Ralph Nader","< 0.1","$0.00","$0.00","-"
"Someone Else","33.0","$0.09","$2.67","1.2"
"Overall","1771.0","$0.19","$334.62","1.3"
7.7.2.4 Hacking the hack
T here are any number of ways you c ould s lic e and dic e this
hac k:
A dd func tionality to s et your preferred maximum C P C by
adding $agent->current_form->value( 'price', 1.25 ); jus t
before $agent->click(`recalculate'); on the pric ing page.
O r further alter the s c ript to take this value on the
c ommand line.
M ake the s c ript more interac tive, allowing you to
retrieve es timated C P C , alter your keyword lis t, and go
bac k for rec alc ulationa ll without the overhead of logging
bac k in and meandering through the s ame s c reens eac h
time.
A utomate things s till further s uc h that you c an s et up
c ampaigns and ads right from the c ommand line or
another applic ation. (T his is s omething we weren't
inc lined to get into, but thought a mention worthwhile.)
A utomating and s c raping is a brittle
proc es s ["A N ote on Spidering and
Sc raping" in C hapter 9 ] and, as s uc h, is
s ubjec t to breakage, loc kouts , or s imply
being as ked to s top.
Leland Johns on and Rael Dornfes t
Hack 85. Serve Backup Ads
Use A dSense's built-in (and rather thoughtf ul) ability to serve
ads f rom alternate URLs when there are no targeted ads to
of f er.
T here's a time and plac e for public s ervic e announc ements . You
jus t might not think your web s ite is the plac e and c ertainly not if
it happens more than oc c as ionally. When you s igned up for
A dSens e (while you're no doubt a good c itizen who pays their
public radio and televis ion dues ), your intent was to reap a
revenue s tream from all the hard work that you've put into your
c ontent.
Yet there are times when a new s ec tion of your s ite has n't yet
been notic ed and indexed by G oogle, A dSens e has nothing
appropriately targeted in its inventory, or there's a temporary
outage of s ome kind. T he net res ult is that you'll be running
public s ervic e ads for the Red C ros s or the like rather than
revenue- generating, targeted advertis ing. G oogle A dSens e
does n't get paid and s o does n't pay you for c lic k- throughs on
public s ervic e advertis ements .
N ow, you c an either s imply be O K with this c oming up every s o
oftenI know I amo r you c an make us e of a bac kup s ys tem
G oogle A dSens e provides : alternate ad U RL s .
P oint your brows er at G oogle A dSens e
(http://www.google.c om/ads ens e) and c lic k the A d Settings tab
at the top of the page. T hen, s c roll down until you s ee "A lternate
ad U RL or c olor," as s hown in Figure 7 - 1 1 .
Figure 7-11. Provide an alternate URL for ads
when AdSense has only public service
advertisements to offer your site
G oogle A dSens e s ugges ts
(https ://google.c om/ads ens e/faq#bas ic s 1 0 ) four bac kup
options :
I mage
P as te in the U RL of an image s omewhere on the Web, ad
or not, s tatic or dynamic ally generated. T his c an be an
alternate image that you've c reated and are s erving from
your own s ite, one produc ed on- the- fly by another
advertis ing s ervic e, or any other image that either has
s ome revenue s tream as s oc iated with it or s imply
tic kles your fanc y. For example, to s erve up a s tatic
image named advert1.j pg res iding on your web s ite,
you'd provide a U RL like
http://www.example.com/images/advert1.jpg.
Clickable image
P rovide the U RL of an H T M L page s omewhere out on the
Web that c ontains only a s nippet of markup for a
hyperlinked image. For example, you might have a file on
your s ite c alled ads ens e_alternate.html that c ontains the
following line:
<a href="http://www.example.com/storefront/"><img src="htt
images/advert2.jpg border="0" /></a>
T hat's all the file s hould have in it,
mind you; leave off all the opening
<html><head></head><body> and
c los ing </body></html> bits and
everything els e you us ually pac k
into your pages .
T he U RL you'd provide as an alternate would then be a
pointer to that partial page, s omething like
http://www.example.com/adsense_alternate.html.
HTML color code
I f you have nothing to dis play as an alternative and are
dead s et agains t running public s ervic e ads , blank out
the s pac e where the A dSens e ad would have gone by,
providing the hexadec imal H T M L c olor c ode of your
page's bac kground or that partic ular bit of real es tate.
For example, if your page had a bac kground c olor of
#160B35, a lovely dark blue that I us e on my own s ite,
you'd type that c olor c ode right into the "A lternate ad
U RL or c olor" field.
Collaps e your ad
G oogle provides an H T M L file you c an download to and
s erve from your own web s ite that c alls a bit of
J avaSc ript to c ollaps e your ad s o that it does n't s how in
the event you'd otherwis e have s een a public s ervic e ad.
For ins truc tions and a link to download the file, vis it
https ://google.c om/ads ens e/faq#bas ic s 1 3 .
Whic hever you c hoos e, when you c lic k the "U pdate c ode"
button, a s midgeon of J avaSc ript (the third line in Figure 7 - 1 2 )
will be added to the A dSens e c ode that you pas te into your web
page. T his additional line provides all A dSens e needs to s erve
up your alternate ad c hoic e when it has no targeted ad to run on
your s ite.
Figure 7-12. An alternate ad URL embedded in
Google AdSense JavaScript code
T hen again, there is a fifth alternative...
A mazon/G oogle A d Replac ement (A G A R;
http://www.bes tdeals dis c ounts .c om/agar; G N U P ublic L ic ens e)
is a P erl s c ript that s upplements your G oogle A dSens e ads with
produc t advertis ement drawn from A mazon's Web Servic es
(A WS; http://webs ervic es .amazon.c om) and A s s oc iates
(http://as s oc iates .amazon.c om) programs . N ot only does it
s upplement A dSens e, but it als o mimic s it in appearanc e,
s upports all the A dSens e ad s izes , and allows you to c us tomize
your c olor s c heme to matc h what you've c hos en for A dSens e.
For A G A R to be us eful (and financ ially
rewarding), you'll need to have s igned up
as an A mazon A s s oc iate
(http://as s oc iates .amazon.c om) through
whic h you make money on purc has es
res ulting from c lic k- throughs on your s ite.
D ownload A G A R and get it running as a C G I s c ript ["H ow to Run
the H ac ks " in the P refac e]. T here's not muc h at all you need to
c hange in the s c ript its elf, s ave replac ing the default A mazon
A s s oc iates I D with your own:
my $associate_id = "insert your amazon associates id here";
I f you're in the U nited Kingdom rather than
the U nited States , you'll als o want to
c hange the loc ale in my $locale = "us"; to
uk and my $uk_associate_id =
"coolstufftoown"; to your U .K. A mazon
A s s oc iates I D . I f you're neither in the U .S.
nor the U .K., there is s ome further
adjus tment nec es s ary, but we leave that
as an exerc is e for the reader.
P oint your brows er direc tly at the C G I s c ript to tes t it out and
you s hould s ee an ad banner, as s hown in Figure 7 - 1 3 , eas ily
c onfus ed for an A dSens e ad at firs t blus h, but c learly linked to
A mazon produc ts .
Figure 7-13. An AGAR-generated AdSense-like
Amazon banner ad
T he produc t c ategory is c hos en at random by default (go ahead
and reload the page a few times to s ee this in ac tion), but this
c an be c us tomized either by altering the s ettings baked into the
s c ript its elf or embedding s ettings into the agar.cgi U RL . For
example, ins tead of jus t pointing at agar.cgi, try agar.cgi??
input_mode=kitchen&input_id=491864&ad_format=125x125_as. T his
produc es a 1 2 5 by 1 2 5 pixel ad drawing from A mazon's
"kitc hen" c ategory.
While the c onc ept of mode and id, as
expres s ed in the prec eding U RL is beyond
the s c ope of this book, s uffic e it to s ay that
you need to pas s matc hing textual and
numeric al brows e I Ds . You'll find a detailed
des c ription of brows e nodes and I D s in the
A mazon Web Servic es doc umentation and a
lis t of s ome of the text/number pairs in the
A G A R c ode its elfl ook for %browse_ids =.
For an introduc tion to A mazon Web
Servic es and all other things A mazon, pic k
up this book's c ous in, Amazon Hacks
(http://www.oreilly.c om/c atalog/amazonhks ;
O 'Reilly) by P aul Baus c h.
(I f I may, I 'd like to end with a pitc h to at leas t c ons ider letting
the A dSens e public s ervic e advertis ements run. Sorry, I jus t
c ouldn't help mys elf.)
Chapter 8. Webmastering
H ac ks 8 6 - 9 1
Sec tion 8 .2 . G oogle's I mportanc e to Webmas ters
Sec tion 8 .3 . T he M ys terious P ageRank
Sec tion 8 .4 . T he E qually M ys terious Ranking A lgorithm
Sec tion 8 .5 . Keeping U p with G oogle's C hanges
Sec tion 8 .6 . I n a Word: Relax
H ac k 8 6 . A Webmas ter's I ntroduc tion to G oogle
H ac k 8 7 . G et I ns ide the P ageRank A lgorithm
H ac k 8 8 . 2 6 Steps to 1 5 K a D ay
H ac k 8 9 . Be a G ood Searc h E ngine C itizen
H ac k 9 0 . C lean U p for a G oogle V is it
H ac k 9 1 . Remove Your M aterials from G oogle
Hacks 86-91
When the Web was was younger, the s earc h engine field was wide
open. T here were lots of major s earc h engines , inc luding
A ltaV is ta, E xc ite, H otBot, and Webc rawler. T his proliferation of
s earc h engines had both advantages and dis advantages . O ne
dis advantage was that you had to make s ure you s ubmitted your
query to s everal different plac es , while an advantage was that
you had s everal inflows of traffic s pawned from s earc h engines .
A s the number of s earc h engines has dwindled, G oogle's index
(and influenc e) has grown. You no longer have to worry s o muc h
about s ubmitting to different plac es , but you do have to be aware
of G oogle at all times .
8.2. Google's Importance to Webmasters
But is n't G oogle jus t a s earc h engine web s ite like any other?
A c tually, its reac h is far greater than that. G oogle partners with
other s ites to us e the G oogle index res ults , inc luding the likes of
heavyweight properties A O L and Yahoo! , not to mention the
multitude of s ites out there making us e of the G oogle A P I . So
when you think about potential vis itors from G oogle s earc h
res ults , you have to think beyond traditional s earc h s ite borders .
G oogle's perc eption of your s ite has bec ome inc reas ingly more
important, whic h means that you're going to have to be s ure that
your s ite abides by the G oogle rules or ris ks not being pic ked
up. I f you're very c onc erned about s earc h- engine traffic , you're
going to have to make s ure that your s ite is optimized for luring
in G oogle s piders and being indexed effec tively. A nd if you're
c onc erned that G oogle s hould not index s ome parts of your s ite,
you need to unders tand the ins and outs of c onfiguring your
robots .txt file to reflec t your preferenc es .
8.3. The Mysterious PageRank
You might hear a lot of people talk about G oogle's P ageRank,
bragging about attaining the mis ty heights of rank 7 or 8 or
s peaking reverently of s ites that have ac hieved 9 or 1 0 .
P ageRanks range from (s ites that have been penalized or not
ranked) to 1 0 (res erved for only the mos t popular s ites , s uc h as
Yahoo! and G oogle its elf). T he only plac e where you c an ac tually
s ee the P ageRank of a given U RL is the G oogle Toolbar [Hack
#60], though you c an get s ome idea of popularity from the
G oogle D irec tory. L is tings in the G oogle D irec tory c ontain a
green bar next to them, providing a good idea of a lis ting's
popularity without giving an exac t number.
G oogle has never provided the entire formula for their P ageRank,
s o all you will find in this book is c onjec ture. I t wouldn't s urpris e
me to learn that the formula is c hanging all the time; as millions
of people try myriad methods to inc reas e their page ranking,
G oogle has to take thes e efforts into ac c ount and (s ometimes )
reac t agains t them.
Why is P ageRank s o important? Bec aus e G oogle us es that as
one as pec t of determining how a given U RL will rank among
millions of pos s ible s earc h res ults . But that's only one as pec t.
T he other as pec ts are determined via G oogle's ranking
algorithm.
8.4. The Equally Mysterious Ranking
Algorithm
I f you thought G oogle was tight- lipped about how it determines
P ageRank, it's an abs olute oys ter when it c omes to the ranking
algorithm, the way that G oogle determines the order of s earc h
res ults . T his book c an give you s ome ideas , but again, thes e
ideas are c onjec ture, and again, the algorithm is c ons tantly
c hanging. Your bes t bet is to c reate a c ontent- ric h web s ite and
update it often. G oogle apprec iates good c ontent.
O f c ours e, being lis ted in G oogle's index is not the only way to
tell vis itors about your s ite. You als o have the option to
advertis e on G oogle.
8.5. Keeping Up with Google's Changes
With G oogle having s uc h a leading pos ition in the s earc h engine
world and s o many webmas ters looking to G oogle for traffic , you
might gues s that there's a lot of dis c us s ion about G oogle in
various plac es around the Web. A nd you'd be right! M y favorite
plac e for G oogle news and gos s ip is Webmas ter World
(http://www.webmas terworld.c om). I t's not often that the terms
c ivilized and online forums go together, but they do in this c as e.
D is c ours e on this s ite is friendly, informative, and generally
flame- free. I have learned a lot from this s ite.
T here are als o a few weblogs foc us ed on G oogle and s earc hing in
general:
G oogle Weblog (http://google.blogs pac e.c om) keeps on
top of everything G oogle, from the newes t s earc h s yntax
to G oogle's holiday logos
(http://www.google.c om/holidaylogos .html).
G oogle Blog (http://www.google.c om/googleblog) is the
offic ial G oogle weblog and features announc ements ,
pointers , and behind- the- s c enes c ommentary from the
G oogleplex.
J ohn Battelle's Searc hblog (http://battellemedia.c om)
c overs s earc h in all forms .
8.6. In a Word: Relax
O ne of the things that I have learned is that a lot of people
s pend a lot of time worrying about how G oogle works , and further,
they worry about how they c an get the highes t pos s ible ranking.
I c an apprec iate their c onc ern bec aus e s earc h engine traffic
means a lot to an online bus ines s . But the res t of us s hould jus t
relax. A s long as we c onc entrate on c ontent that's good for
vis itors (and not jus t s piders ), G oogle's ranking algorithms will
apprec iate our s ites .
Hack 86. A Webmaster's Introduction to
Google
Steps to take f or optimal Google indexing of your site.
T he c orners tone of any good s earc h engine is highly relevant
res ults . G oogle's unprec edented s uc c es s has been due to its
unc anny ability to matc h quality information with a us er's s earc h
terms . T he c ore of G oogle's s earc h res ults are bas ed on a
patented algorithm c alled P ageRank.
T here is an entire indus try foc us ed on getting s ites lis ted near
the top of s earc h engines . G oogle has proven to be the toughes t
s earc h engine for a s ite to do well on. E ven s o, it is n't all that
diffic ult for a new web s ite to get lis ted and begin rec eiving s ome
traffic from G oogle.
L earning the ins and outs of getting your s ite lis ted by a s earc h
engine c an be a daunting tas k. T here is a vas t array of
information about s earc h engines on the Web, and not all of it is
us eful or proper. T his dis c us s ion of getting your s ite into the
G oogle databas e foc us es on long- term tec hniques for
s uc c es s fully promoting your s ite through G oogle, helping to
avoid s ome of the c ommon mis c onc eptions and problems that a
new s ite owner might fac e.
8.7.1. Search Engine Basics
When you type a term into a s earc h s ite, the engine looks up
potential matc hes in its databas e and pres ents the bes t web
page matc hes firs t. H ow thos e web pages get into the databas e,
and c ons equently, how you c an get yours in there too, is a three-
s tep proc es s :
1. A s earc h engine vis its a s ite with an automated program
c alled a s pider (s ometimes c alled a robot). A s pider is
jus t a program s imilar to a web brows er that downloads a
s ite's pages . I t does n't ac tually dis play the page
anywhere; it jus t downloads the page data.
2. A fter the s pider has ac quired the page, the s earc h
engine pas s es the page to a program c alled an indexer,
whic h is another robotic program that extrac ts mos t of
the vis ible portions of the page. T he indexer als o
analyzes the page for keywords , the title, links , and
other important information c ontained in the c ode.
3. T he s earc h engine adds your s ite to its databas e and
makes it available to s earc hers . T he greates t differenc e
between s earc h engines is in this final s tep where
ranking or res ult pos ition under a partic ular keyword is
determined.
8.7.2. Submitting Your Site to Google
T he firs t s tep is to get your pages lis ted in the databas e, and
there are two ways to go about it. T he firs t is direc t s ubmis s ion
of your s ite's U RL to G oogle via its add U RL or s ubmis s ion page.
To c ounter programmed robots , s earc h engines routinely move
s ubmis s ion pages around on their s ites . You c an find G oogle's
s ubmis s ion page linked from their H elp pages or Webmas ter I nfo
pages (http://www.google.c om/addurl.html).
J us t vis it G oogle's add U RL page and enter the main index page
for your s ite into the s ubmis s ion form, and pres s s ubmit.
G oogle's s pider (c alled G oogleBot) will vis it your page us ually
within four weeks . T he s pider will travers e all pages on your s ite
and add them to its index. Within eight weeks , you s hould be
able to find your s ite lis ted in G oogle.
T he s ec ond way to get your s ite lis ted is to let G oogle find you
bas ed on links that may be pointing to your s ite. O nc e
G oogleBot finds a link to your s ite from a page it already has in
its index, it will vis it your s ite.
G oogle has been updating its databas e on a monthly bas is for
three years . I t s ends its s pider out in c rawler mode onc e a
month, too. C rawler mode is a s pec ial mode when a s pider
travers es or crawls the entire Web. A s it runs into links to pages ,
it indexes thos e pages in a never- ending attempt to download all
the pages it c an. O nc e your pages are lis ted in G oogle, they are
revis ited and updated on a monthly bas is . I f you frequently
update your c ontent, G oogle may index your s earc h terms more
often.
O nc e you are indexed and lis ted in G oogle, the next natural
ques tion for a s ite owner is , "H ow c an I rank better under my
applic able s earc h terms ? "
8.7.3. The Search Engine Optimization
Template
T his is my general rec ipe for the ubiquitous G oogle. I t is generic
enough that it works well everywhere. I t's as c los e as I have
c ome to a "one- s ize- fits - all" SE O (that's Searc h E ngine
O ptimization) template.
U s e your targeted keyword phras e:
I n META keywords . I t's not nec es s ary for G oogle, but a
good habit. Keep your META keywords s hort (1 2 8
c harac ters max, or 1 0 keywords ).
I n META des c ription. Keep keyword c los e to the left but in
a full s entenc e.
I n the title at the far left but pos s ibly not as the firs t
word.
I n the top portion of the page in the firs t s entenc e of the
firs t full paragraph (plain text: no bold, no italic , no
s tyle).
I n an H 3 or larger heading.
I n bolds ec ond paragraph if pos s ible and anywhere but
the firs t us age on page.
I n italic a nywhere but the firs t us age.
I n s ubs c ript/s upers c ript.
I n U RL (direc tory name, filename, or domain name). D o
not duplic ate the keyword in the U RL .
I n an image filename us ed on the page.
I n ALT tag of that previous image mentioned.
I n the title attribute of that image.
I n link text to another s ite.
I n an internal link's text.
I n title attribute of all links targeted in and out of page.
I n the filename of your external C SS (C as c ading Style
Sheet) or J avaSc ript file.
I n an inbound link on s ite (preferably from your home
page).
I n an inbound link from off s ite (if pos s ible).
I n a link to a s ite that has a P ageRank of 8 or better.
O ther s earc h engine optimization things to c ons ider inc lude:
U s e "las t modified" headers if you c an.
Validate that H T M L . Some feel G oogle's pars er has
bec ome s tric ter at pars ing ins tead of milder. I t will mis s
an entire page bec aus e of a few s imple errors we have
tes ted this in depth.
U s e an H T M L template throughout your s ite. G oogle c an
s pot the template and pars e it off. (O f c ours e, this als o
means they are pretty good a s potting duplic ate
c ontent.)
Keep the page as .html or .htm extens ion. A ny dynamic
extens ion is a ris k.
Keep the H T M L below 2 0 K; 5 to 1 5 K is the ideal range.
Keep the ratio of text to H T M L very high. Text s hould
outweigh H T M L by s ignific ant amounts .
D ouble- c hec k your page in N ets c ape, O pera, and
I nternet E xplorer. U s e Lynx if you have it.
U s e only raw hrEFs for links . Keep J avaSc ript far, far
away from links . T he s impler the link c ode the better.
T he traffic c omes when you figure out that 1 referral a
day to 1 0 pages is better than 1 0 referrals a day to 1
page.
D on't as s ume that keywords in your s ite's navigation
template will be worth anything at all. G oogle looks for
full s entenc es and paragraphs . Keywords jus t lying
around orphaned on the page are not worth as muc h as
when us ed in a s entenc e.
Brett Tabke
Hack 87. Get Inside the PageRank
Algorithm
Delve into the inner workings of the Google PageRank algorithm
and how it af f ects results.
P ageRank is the algorithm us ed by the G oogle s earc h engine,
originally formulated by Sergey Brin and L arry P age in their
paper "T he A natomy of a L arge- Sc ale H ypertextual Web Searc h
E ngine."
I t is bas ed on the premis e, prevalent in the world of ac ademia,
that the importanc e of a res earc h paper c an be judged by the
number of c itations the paper has from other res earc h papers .
Brin and P age have s imply trans ferred this premis e to its Web
equivalent: the importanc e of a web page c an be judged by the
number of hyperlinks pointing to it from other web pages .
8.8.1. So What Is the Algorithm?
I t may look daunting to non- mathematic ians , but the P ageRank
algorithm is in fac t elegantly s imple and is c alc ulated as follows :
PR(A) = (1-d) + d { PR(T1) + ... + PR(Tn) }
------ ------
C(T1) C(Tn)
P R(A ) is the P ageRank of a page A .
P R(T 1 ) is the P ageRank of a page T 1 .
C (T 1 ) is the number of outgoing links from the page T 1 .
d is a damping fac tor in the range 0 < d < 1 , us ually s et
to 0 .8 5 .
T he P ageRank of a web page is therefore c alc ulated as a s um of
the P ageRanks of all pages linking to it (its inc oming links ),
divided by the number of links on eac h of thos e pages (its
outgoing links ).
8.8.2. And What Does This Mean?
From a s earc h engine marketer's point of view, this means there
are two ways in whic h P ageRank c an affec t the pos ition of your
page on G oogle:
T he number of inc oming links . O bvious ly, the more of
thes e, the better. But there is another thing the algorithm
tells us : no inc oming link c an have a negative effec t on
the P ageRank of the page it points at. A t wors t, it c an
s imply have no effec t at all.
T he number of outgoing links on the page that points to
your page. T he fewer of thes e, the better. T his is
interes ting: it means that, given two pages of equal
P ageRank linking to you, one with 5 outgoing links and
the other with 1 0 , you will get twic e the inc reas e in
P ageRank from the page with only 5 outgoing links .
A t this point, we take a s tep bac k and as k ours elves jus t how
important P ageRank is to the pos ition of your page in the G oogle
s earc h res ults .
T he next thing that we c an obs erve about the P ageRank
algorithm is that it has nothing whats oever to do with relevanc e
to the s earc h terms queried. I t is s imply a s ingle (admittedly
important) part of the entire G oogle relevanc e ranking algorithm.
P erhaps a good way to look at P ageRank is as a multiplying
fac tor applied to the G oogle s earc h res ults after all other
c omputations have been c ompleted. T he G oogle algorithm firs t
c alc ulates the relevanc e of pages in its index to the s earc h
terms , and then multiplies this relevanc e by the P ageRank to
produc e a final lis t. T he higher your P ageRank, therefore, the
higher up the res ults you will be, but there are s till many other
fac tors related to the pos itioning of words on the page that mus t
be c ons idered firs t.
8.8.3. So What's the Use of the PageRank
Calculator?
I f no inc oming link has a negative effec t, s urely I s hould jus t get
as many as pos s ible, regardles s of the number of outgoing links
on its page?
Well, not entirely. T he P ageRank algorithm is c leverly balanc ed.
J us t like the c ons ervation of energy in phys ic s with every
reac tion, P ageRank is als o c ons erved with every c alc ulation. For
ins tanc e, if a page with a s tarting P ageRank of 4 has two
outgoing links on it, we know that the amount of P ageRank it
pas s es on is divided equally between all of its outgoing links . I n
this c as e, 4 / 2 = 2 units of P ageRank is pas s ed on to eac h of 2
s eparate pages , and 2 + 2 = 4 s o the total P ageRank is
pres erved!
T here are s c enarios in whic h you may find
that total P ageRank is not c ons erved after
a c alc ulation. P ageRank its elf is s uppos ed
to repres ent a probability dis tribution, with
the individual P ageRank of a page
repres enting the likelihood of a random
s urfer c hanc ing upon it.
O n a muc h larger s c ale, s uppos ing G oogle's index c ontains a
billion pages , eac h with a P ageRank of 1 , the total P ageRank
ac ros s all pages is equal to a billion. M oreover, eac h time we
rec alc ulate P ageRank, no matter what c hanges in P ageRank may
oc c ur between individual pages , the total P ageRank ac ros s all
one billion pages will s till add up to a billion.
Firs t, this means that, although we may not be able to c hange
the total P ageRank ac ros s all pages , by s trategic linking of
pages within our s ite, we c an affec t the dis tribution of P ageRank
between pages . For ins tanc e, we may want mos t of our vis itors
to c ome into the s ite through our home page. We would therefore
want our home page to have a higher P ageRank relative to other
pages within the s ite. We s hould als o rec all that all the
P ageRank of a page is pas s ed on and divided equally between
eac h outgoing link on a page. We would therefore want to keep as
muc h c ombined P ageRank as pos s ible within our own s ite
without pas s ing it onto external s ites and los ing its benefit. T his
means we would want any page with lots of external links (i.e.,
links to other people's web s ites ) to have a lower P ageRank
relative to other pages within the s ite to minimize the amount of
P ageRank that is leaked to external s ites . A ls o, bear in mind our
earlier s tatement, that P ageRank is s imply a multiplying fac tor
applied onc e G oogle's other c alc ulations regarding relevanc e
have already been c alc ulated. We would therefore want our more
keyword- ric h pages to als o have a higher relative P ageRank.
Sec ond, if we as s ume that every new page in G oogle's index
begins its life with a P ageRank of 1 , there is a way we c an
inc reas e the c ombined P ageRank of pages within our s iteb y
inc reas ing the number of pages ! A s ite with 1 0 pages will s tart
life with a c ombined P ageRank of 1 0 , whic h is then redis tributed
through its hyperlinks . A s ite with 1 2 pages will therefore s tart
with a c ombined P ageRank of 1 2 . We c an thus improve the
P ageRank of our s ite as a whole by c reating new c ontent (i.e.,
more pages ), and then c ontrol the dis tribution of that c ombined
P ageRank through s trategic interlinking between the pages .
A nd this is the purpos e of the P ageRank C alc ulatort o c reate a
model of the s ite on a s mall s c ale inc luding the links between
pages , and s ee what effec t the model has on the dis tribution of
P ageRank.
8.8.4. How Does the PageRank Calculator
Work?
To get a better idea of the realities of P ageRank, vis it the
P ageRank C alc ulator
(http://www.markhorrell.c om/s eo/pagerank.as p).
I t's s imple, really. Start by typing in the number of interlinking
pages that you wis h to analyze and hit Submit. I have c onfined
this number to jus t 2 0 pages to eas e s erver res ourc es . E ven s o,
this s hould give a reas onable indic ation of how s trategic linking
c an affec t the P ageRank dis tribution.
N ext, for eas e of referenc e onc e the c alc ulation has been
performed, provide a label for eac h page (e.g., H ome P age, L inks
P age, C ontac t U s P age, etc .), and again hit Submit.
Finally, us e the lis t boxes to s elec t whic h pages eac h page links
to. You c an us e C trl and Shift to highlight multiple s elec tions .
You c an als o us e this s c reen to c hange the initial P ageRanks of
eac h page. For ins tanc e, if one of your pages is s uppos ed to
repres ent Yahoo! , you may wis h to rais e its initial P ageRank to,
s ay, 3 . H owever, in ac tuality, s tarting P ageRank is irrelevant to
its final c omputed value. I n other words , even if one page were to
s tart with a P ageRank of 1 0 0 , after many iterations of the
equation, the final c omputed P ageRank will c onverge to the
s ame value as it would had it s tarted with a P ageRank of only 1 !
You c an play around with the damping fac tor d, whic h defaults to
0 .8 5 , as this is the value quoted in Brin and P age's res earc h
paper.
Mark Horrell
Hack 88. 26 Steps to 15K a Day
Hot and cold running content is what draws visitors in to your
web site.
Too often, getting vis itors from s earc h engines is boiled down to
a s uc c es s ion of tweaks that may or may not work. But as I s how
in this hac k, s olid c ontent thoughtfully put together c an make
more of an impac t than a dec ade's worth of fiddling with META tags
and building the perfec t title page.
Following thes e 2 6 s teps from A to Z will ens ure a s uc c es s ful
s ite, bringing in plenty of vis itors from G oogle.
8.9.1. A. Prep Work
P repare work and begin building c ontent. L ong before the domain
name is s ettled on, s tart putting together notes to build at leas t
a 1 0 0 - page s ite. T hat's 1 0 0 pages of "real c ontent," not
inc luding link, res ourc e, about, and c opyright pages n ec es s ary
but not c ontent- ric h pages .
C an't think of 1 0 0 pages ' worth of c ontent? C ons ider artic les
about your bus ines s or indus try, Q &A pages , or bac k is s ues of
an online news letter.
8.9.2. B. Choose a Brandable Domain
Name
C hoos e a domain name that's eas ily brandable. You want
Google.com and not Mykeyword.com.
Keyword domains are out; branding and name rec ognition are in.
Big time in. Keywords in a domain name have never meant les s
to s earc h engines . L earn the les s on of G oto.c om bec oming
O verture.c om and unders tand why they c hanged it. I t's one of
the mos t powerful gut c hec k c alls I 've ever s een on the I nternet.
I t took res olve and nerve to blow away s everal years of
branding. (T hat's a whole 'nother artic le, but learn the les s on as
it applies to all of us .)
8.9.3. C. Site Design
T he s impler your s ite des ign, the better. A s a rule of thumb: text
c ontent s hould outweigh H T M L c ontent. T he pages s hould be
validated and us able in everything from Lynx to leading
brows ers . I n other words , keep it c los e to H T M L 3 .2 if you c an.
Spiders are not to the point they really like eating H T M L 4 .0 and
the mes s that it c an bring. Stay away from heavy Flas h, J ava, or
J avaSc ript.
G o external with s c ripting languages if you mus t have them,
though there's little reas on to have them that I c an s ee. T hey
will rarely help a s ite and s tand to hurt it greatly due to many
fac tors that mos t people don't apprec iate (the s earc h engines '
dis tas te for J avaSc ript is jus t one of them). A rrange the s ite in a
logic al manner with direc tory names hitting the top keywords
that you wis h to emphas ize. You c an als o go the other route and
jus t throw everything in the top level of the direc tory (this is
rather c ontrovers ial, but it's been produc ing good long- term
res ults ac ros s many engines ). D on't c lutter and don't s pam your
s ite with frivolous links s uc h as "bes t viewed in ..." or other
things s uc h as c ounters . Keep it c lean and profes s ional to the
bes t of your ability.
L earn the les s on of G oogle its elf: s imple is retro c ool. Simple is
what s urfers want.
Speed is n't everything; it's almos t the only thing. Your s ite
s hould res pond almos t ins tantly to a reques t. I f your s ite has
three to four s ec onds ' delay until "s omething happens " in the
brows er, you are in trouble. T hat three to four s ec onds of
res pons e time may vary in s ites des tined to be viewed in other
c ountries than your native one. T he s ite s hould res pond loc ally
within three to four s ec onds (maximum) to any reques t. L onger
than that, and you'll los e 1 0 % of your audienc e for eac h
additional s ec ond. T hat 1 0 % c ould be the differenc e between
s uc c es s and not.
8.9.4. D. Page Size
T he s maller the page s ize, the better. Keep it under 1 5 K,
inc luding images , if you c an. T he s maller the better. Keep it
under 1 2 K if you c an. T he s maller the better. Keep it under 1 0 K if
you c an; I trus t you are getting the idea here. O ver 5 K and
under 1 0 K. I t's tough to do, but it's worth the effort. Remember,
8 0 % of your s urfers will be at 5 6 K or even les s .
8.9.5. E. Content
Build one page of c ontent (between 2 0 0 and 5 0 0 words ) per day
and put it online.
I f you aren't s ure what you need for c ontent, s tart with the
O verture keyword s ugges tor
(http://inventory.overture.c om/d/s earc hinventory/s ugges tion/)
and find the c ore s et of keywords for your topic area. T hos e are
your s ubjec t s tarters .
8.9.6. F. Keyword Density and Keyword
Positioning
T his is s imple, old- fas hioned Searc h E ngine O ptimization (SE O )
from the ground up.
U s e the keyword onc e in title, onc e in des c ription tag, onc e in a
heading, onc e in the U RL , onc e in bold, onc e in italic , onc e high
on the page, and make s ure the dens ity is between 5 % and 2 0 %
(don't fret about it). U s e good s entenc es and s pellc hec k them!
Spellc hec king is bec oming more important as s earc h engines
are moving to autoc orrec tion during s earc hes . T here is no longer
a reas on to look like you c an't s pell.
8.9.7. G. Outbound Links
From every page, link to one or two high- ranking s ites under the
keyword that you're trying to emphas ize. U s e your keyword in
the link text (this is ultra important for the future).
8.9.8. H. Cross-Links
Cros s -links are links within the s ame s ite.
L ink to on- topic quality c ontent ac ros s your s ite. I f a page is
about food, make s ure it links to the apples and veggies page.
With G oogle, on- topic c ros s - linking is very important for s haring
your P ageRank value ac ros s your s ite. You do not want an all-
s tar page that outperforms the res t of your s ite. You want 5 0
pages that produc e 1 referral eac h a day, not 1 page that
produc es 5 0 referrals a day. I f you do find a page that dras tic ally
outproduc es the res t of the s ite with G oogle, you need to offload
s ome of that P ageRank value to other pages by c ros s - linking
heavily. I t's the old s hare- the- wealth thing.
8.9.9. I. Put It Online
D on't go with virtual hos ting; go with a s tandalone I P addres s .
M ake s ure the s ite is crawlable by a s pider. A ll pages s hould be
linked to more than one other page on your s ite, and not more
than two levels deep from the top direc tory. L ink the topic
vertic ally as muc h as pos s ible bac k to the top direc tory. A menu
that is pres ent on every page s hould link to your s ite's main
topic index pages (the doorways and logic al navigation s ys tem
down into real c ontent). D on't put your s ite online before it is
ready. I t's wors e to put a nothing s ite online than no s ite at all.
You want it to be fles hed out from the s tart.
G o for a lis ting in the O pen D irec tory P rojec t (O D P ),
http://dmoz.org/add.html). G etting ac c epted to the O D P will
probably get your pages lis ted in the G oogle D irec tory.
8.9.10. J. Submit
Submit your main U RL to G oogle, F* , A ltaV is ta, Wis eN ut,
Teoma, D irec tH it, and H otbot. N ow c omes the hard part: forget
about s ubmis s ions for the next s ix months . T hat's right, s ubmit
and forget.
8.9.11. K. Logging and Tracking
G et a quality logger/trac ker that c an do jus tic e to inbound
referrals bas ed on log files . D on't us e a graphic c ounter; you
need a program that's going to provide muc h more information
than that. I f your hos t does n't s upport referrers , bac k up and get
a new hos t. You c an't run a modern s ite without full referrals
available 2 4 /7 /3 6 5 in real time.
8.9.12. L. Spiderings
Watc h for s piders from s earc h engines o ne reas on you need a
good logger and trac ker! M ake s ure that s piders c rawling the full
s ite c an do s o eas ily. I f not, double- c hec k your linking s ys tem to
make s ure that the s pider c an find its way throughout the s ite.
D on't fret if it takes two s piderings to get your whole s ite done
by G oogle or F* . O ther s earc h engines are potluc k; if you
haven't been added within s ix months , it's doubtful that you will
be added at all.
8.9.13. M. Topic Directories
A lmos t every keyword s ec tor has an authority hub on its topic .
Find it (G oogle D irec tory c an be very helpful here bec aus e you
c an view s ites bas ed on how popular they are) and s ubmit within
the guidelines .
8.9.14. N. Links
L ook around your keyword s ec tion in the G oogle D irec tory; this
is bes t done after getting an O pen D irec tory P rojec t lis tingo r
two. Find s ites that have link pages or freely exc hange links .
Simply reques t a s wap. P ut a page of on- topic , in- c ontext links
on your s ite as a c ollec tion s pot. D on't worry if you c an't get
people to s wap links ; move on. Try to s wap links with one fres h
s ite a day. A s imple pers onal email is enough. Stay low- key
about it and don't worry if s ite Z won't link to you. E ventually it
will.
8.9.15. O. Content
A dd one page of quality c ontent per day. T imely, topic al artic les
are always the bes t. Try to s tay away from too muc h weblogging
pers onal materials and look more for article topic s that a general
audienc e will like. H one your writing s kills and read up on the
right s tyle of web s peak that tends to work with the fas t and
furious web c rowd: lots of text breaks s hort s entenc es l ots of
das hes s omething that reads quic kly.
M os t web us ers don't ac tually read; they s c an. T his is why it is
s o important to keep key pages to a minimum. I f people s ee a
huge overblown page, a portion of them will hit the Bac k button
before trying to dec ipher it. T hey've got better things to do than
was te 1 5 s ec onds (a s tretc h) at unders tanding your whizbang
menu s ys tem. Bec aus e s ome big s upport s ite c an run Flas h-
heavy pages is no indic ation that you c an. You don't have the
pull fac tor that they do.
U s e headers and bold s tandout text liberally on your pages as
logic al s eparators . I c all them s canner s toppers where the eye
will logic ally c ome to res t on the page.
8.9.16. P. Gimmicks
Stay far away from any fads of the day or anything that appears
s pammy, unethic al, or tric ky. P lant yours elf firmly on the high
ground in the middle of the road.
8.9.17. Q. Linkbacks
When you rec eive reques ts for links , c hec k s ites out before
linking bac k to them. C hec k them through G oogle for their
P ageRank value. L ook for direc tory lis tings . D on't link bac k to
junk jus t bec aus e they as ked. M ake s ure it is a s ite s imilar to
yours and on- topic . L inking to bad neighborhoods , as G oogle
c alls them, c an ac tually c os t you P ageRank points .
8.9.18. R. Rounding Out Your Offerings
U s e options s uc h as "email a friend," forums , and mailing lis ts to
round out your s ite's offerings . H it the top forums in your market
and read, read, read until your eyes hurt. Stay away from affiliate
fades that ins ert c ontent onto your s ite s uc h as banners and
pop- up windows .
8.9.19. S. Beware of Flyer and Brochure
Syndrome
I f you have an ec onomic al s ite or online vers ion of bric ks and
mortar, be c areful not to turn your s ite into a broc hure. T hes e
don't work at all. T hink about what people want. T hey aren't
c oming to your s ite to view your content, they are c oming to your
s ite looking for their content. Talk as little about your produc ts
and yours elf as pos s ible in artic les (s ounds c ounterintuitive,
does n't it? ).
8.9.20. T. Keep Building One Page of
Content Per Day
H ead bac k to the O verture s ugges tion tool
(http://inventory.overture.c om/d/s earc hinventory/s ugges tion/)
to get ideas for fres h pages .
8.9.21. U. Study Those Logs
A fter a month or two, you will s tart to s ee a few referrals from
plac es you've gotten lis ted. L ook for the keywords people are
us ing. See any bizarre c ombinations ? Why are people us ing
thos e to find your s ite? I f there is s omething you have
overlooked, then build a page around that topic . E ngineer your
s ite to feed the s earc h engine what it wants . I f your s ite is about
oranges , but your referrals are all about orange c itrus fruit, then
get bus y building artic les around c itrus and fruit ins tead of the
generic oranges . T he s earc h engines will tell you exac tly what
they want to be fed. L is ten c los ely! T here is gold in referral logs ;
it's jus t a matter of panning for it.
8.9.22. V. Timely Topics
N othing breeds s uc c es s like s uc c es s . Stay abreas t of
developments in your topic of interes t. I f big s ite Z is c oming out
with produc t A at the end of the year, build a page and have it
ready in O c tober s o that s earc h engines get it by D ec ember.
8.9.23. W. Friends and Family
N etworking is c ritic al to the s uc c es s of a s ite. T his is where all
that time you s pend in forums will pay off. H ere's the c atc h- 2 2
about forums : lurking is almos t us eles s . T he value of a forum is
in the interac tion with your fellow c olleagues and c ohorts . You
learn from the interac tion, not by jus t reading. N etworking will
pay off in linkbac ks , tips , and email exc hanges , and will
generally put you in the loop of your keyword s ec tor.
8.9.24. X. Notes, Notes, Notes
I f you build one page per day, you will find that brains torm- like
ins piration will hit you in the head at s ome magic point. Whether
it is in the s hower (dry off firs t), driving down the road (pleas e
pull over), or jus t parked at your des k, write it down! Ten
minutes of work later, you will have forgotten all about that great
idea you jus t had. Write it down and get detailed about what you
are thinking. When the ins pirational juic es are no longer flowing,
c ome bac k to thos e c ontent ideas . I t s ounds s imple, but it's a
lifes aver when the ideas s top c oming.
8.9.25. Y. Submission Check at Six Months
A fter s ix months , walk bac k through your s ubmis s ions and s ee if
you have been lis ted in all the s earc h engines that you
s ubmitted to. I f not, res ubmit and forget again. Try thos e freebie
direc tories again, too.
8.9.26. Z. Keep Building Those Pages of
Quality Content!
Starting to s ee a theme here? G oogle loves c ontent, lots of
quality c ontent. T he c ontent that you generate s hould be bas ed
around a variety of keywords . A t the end of a year's time, you
s hould have around 4 0 0 pages of c ontent. T hat will get you good
plac ement under a wide range of keywords , generate rec iproc al
links , and overall pos ition your s ite to s tand on its own two feet.
D o thes e 2 6 things , and I guarantee you that in one year's time
you will c all your s ite a s uc c es s . I t will be drawing between 5 0 0
and 2 ,0 0 0 referrals a day from s earc h engines . I f you build a
good s ite and ac hieve an average of four to five pageviews per
vis itors , you s hould be in the 1 0 K- 1 5 K page views per day range
in one year's time. What you do with that traffic is up to you!
Brett Tabke
Hack 89. Be a Good Search Engine
Citizen
Five don'ts and one do f or getting your site indexed by Google.
A high ranking in G oogle c an mean a great deal of traffic .
Bec aus e of that, there are lots of people s pending lots of time
trying to figure out the infallible way to get a high ranking from
G oogle. A dd this . Remove that. G et a link from this . D on't pos t a
link to that.
Submitting your s ite to G oogle to be indexed is s imple enough.
G oogle's got a s ite s ubmis s ion form
(http://www.google.c om/addurl.html), though they s ay that, if
your s ite has at leas t a few inbound links (other s ites that link to
you), they s hould find you that way. I n fac t, G oogle enc ourages
U RL s ubmitters to get lis ted on T he O pen D irec tory P rojec t
(O D P, http://www.dmoz.org) or Yahoo! (http://www.yahoo.c om).
N obody knows the s ec ret of ac hieving high P ageRank without
effort. G oogle us es a variety of elements , inc luding page
popularity, to determine P ageRank. P ageRank is one of the
fac tors determining how high up a page appears in s earc h
res ults . But there are s everal things that you s hould not be
doing and one big thing that you abs olutely s hould.
D oes breaking one of thes e rules mean that you're automatic ally
going to be thrown out of G oogle's index? N o, there are over four
billion pages in G oogle's index at this writing, and it's unlikely
that they'll find out about your violation immediately. But there's
a good c hanc e that they'll find out eventually. I s it worth it
having your s ite removed from the mos t popular s earc h engine
on the I nternet?
8.10.1. Thou Shalt Not:
Cloak
C loaking is when your web s ite is s et up s uc h that
s earc h engine s piders get different pages from thos e
that human s urfers get. H ow does the web s ite know
whic h are the s piders and whic h are the humans ? By
identifying the s pider's U s er A gent or I P t he latter being
the more reliable method.
A n I nternet P rotoc ol (I P ) addres s is the c omputer
addres s from whic h a s pider c omes . E verything that
c onnec ts to the I nternet has an I P addres s . Sometimes
the I P addres s is always the s ame, as with web s ites .
Sometimes the I P addres s c hanges ; that's c alled a
dynamic addres s . (I f you us e a dial- up modem, c hanc es
are good that every time you log onto the I nternet your
I P addres s is different. T hat's a dynamic I P addres s .)
A Us er Agent is a way a program that s urfs the Web
identifies its elf. I nternet brows ers like M ozilla us e U s er
A gents , as do s earc h engine s piders . T here are literally
dozens of different kinds of U s er A gents ; s ee the Web
Robots D atabas e
(http://www.robots txt.org/wc /ac tive.html) for an
extens ive lis t.
A dvoc ates of c loaking c laim that c loaking is us eful to
abs olutely optimize c ontent for s piders . A nti- c loaking
c ritic s c laim that c loaking is an eas y way to
mis repres ent s ite c ontentfeeding a s pider a page that's
des igned to get the s ite hits for pudding c ups when
ac tually it's all about bas eball bats . You c an get more
details about c loaking and different pers pec tives on it at
http://pandec ta.c om/,
http://www.apromotionguide.c om/c loaking.html, and
http://www.webopedia.c om/T E RM /C /c loaking.html.
Hide text
Text is hidden by putting words or links in a web page
that are the s ame c olor as the page's bac kgroundp utting
white words on a white bac kground, for example. T his is
als o c alled fontmatching. Why would you do this ?
Bec aus e a s earc h engine s pider c ould read the words
you've hidden on the page while a human vis itor
c ouldn't. A gain, doing this and getting c aught c ould get
you banned from G oogle's index, s o don't.
T hat goes for other page c ontent tric ks too, s uc h as title
s tacking (putting multiple c opies of a title tag on one
page), putting keywords in c omment tags , keyword
s tuffing (putting multiple c opies of keywords in very
s mall font on page), putting keywords not relevant to
your s ite in your META tags , and s o on. G oogle does n't
provide an exhaus tive lis t of thes e types of tric ks on
their s ite, but any attempt to c irc umvent or fool their
ranking s ys tem is likely to be frowned upon. T heir
attitude is more like: "You c an do anything you want to
with your pages , and we c an do anything we want to with
our indexs uc h as exc luding your pages ."
Use doorway pages
Sometimes , doorway pages are c alled gateway pages .
T hes e are pages that are aimed s pec ific ally at one topic ,
whic h don't have a lot of their own original c ontent and
whic h lead to the main page of a s ite (thus the name
doorway pages ).
For example, s ay you have a page devoted to c ooking.
You c reate doorway pages for s everal genres of
c ookingF renc h c ooking, C hines e c ooking, vegetarian
c ooking, etc . T he pages c ontain terms and META tags
relevant to eac h genre, but mos t of the text is a c opy of
all the other doorway pages , and all it does is point to
your main s ite.
D oorway pages are illegal in G oogle and annoying to the
G oogle us er, s o don't do it. You c an learn more about
doorway pages at
http://s earc henginewatc h.c om/webmas ters /bridge.html
or
http://www.s earc hengineguide.c om/whalen/2 0 0 2 /0 5 3 0 _jw
Check your link rank with automated queries
U s ing automated queries (exc ept for the s anc tioned
G oogle A P I ) is agains t G oogle's Terms of Servic e
anyway. U s ing an automated query to c hec k your
P ageRank every 1 2 s ec onds is triple- bad; it's not what
the s earc h engine was built for and G oogle probably
c ons iders it a was te of their time and res ourc es .
Link to "bad neighborhoods"
Bad neighborhoods are thos e s ites that exis t only to
propagate links . Bec aus e link popularity is one as pec t of
how G oogle determines P ageRank, s ome s ites have s et
up link farms s ites that exis t only for the purpos e of
building s ite popularity with bunc hes of links . T he links
are not topic al, like a s pec ialty s ubjec t index, and
they're not well- reviewed, like Yahoo! ; they're jus t a pile
of links . A nother example of a bad neighborhood is a
general FFA page. FFA s tands for free for all; it's a page
where anyone c an add their link. L inking to pages like
that is grounds for a penalty from G oogle.
N ow, what happens if a page like that links to you? Will
G oogle penalize your page? N o. G oogle ac c epts that you
have no c ontrol over who links to your s ite.
8.10.2. Thou Shalt:
Create great content
A ll the H T M L c ontortions in the world will do you little
good if you have lous y, old, or limited c ontent. I f you
c reate great c ontent and promote it without playing
s earc h engine games , you will get notic ed and you will
get links . Remember Sturgeon's L aw: "N inety perc ent of
everything is c rud." Why not make your web s ite an
exc eption?
8.10.3. What Happens If You Reform?
M aybe you have a s ite that is not exac tly the work of a good
s earc h engine c itizen. M aybe you have 5 0 0 doorway pages , 1 0
title tags per page, and enough hidden text to make an O 'Reilly
P oc ket G uide. But maybe now you want to reform. You want to
have a c lean lovely s ite and leave the doorway pages to Better
Homes and Gardens . A re you doomed? Will G oogle ban your s ite
for the res t of its life?
N o. T he firs t thing you need to do is c lean up your s iteremove all
trac es of rule breaking. N ext, s end a note about your s ite
c hanges and the U RL to help@ google.c om. N ote that G oogle
really does n't have the res ourc es to ans wer every email about
why they did or didn't index a s iteo therwis e, they'd be ans wering
emails all daya nd there's no guarantee that they will reindex
your kinder, gentler s ite. But they will look at your mes s age.
8.10.4. What Happens If You Spot Google
Abusers in the Index?
What if s ome other s ite that you c ome ac ros s in your G oogle
s earc hing is abus ing G oogle's s pider and P ageRank
mec hanis m? You have two options . You c an s end an email to
s pamreport@ google.c om or fill out the form at
http://www.google.c om/c ontac t/s pamreport.html. (I 'd fill out the
form; it reports the abus e in a s tandard format that G oogle is
us ed to s eeing.)
Hack 90. Clean Up for a Google Visit
Bef ore you submit your site to Google, make sure that you've
cleaned it up to make the most of your indexing.
You c lean up your hous e when you have important gues ts over,
right? I f you want vis itors , G oogle's c rawler is one of the mos t
important gues ts that your s ite will ever have. A high G oogle
ranking c an lead to inc redible numbers of referrals , both from
G oogle's main s ite and from s ites that have s earc h powered by
G oogle.
To make the mos t of your lis ting, s tep bac k and look at your s ite.
By making s ome adjus tments , you c an make your s ite both more
G oogle- friendly and more vis itor- friendly.
If you must use a splash page, have a text link f rom it.
I f I had a dollar for every time I went to the front page of
a s ite and s aw no way to navigate bes ides a Flas h
movie, I 'd be able to nap for a living. G oogle does n't
index Flas h files , s o unles s you have s ome kind of text
link on your s plas h page (a "Skip T his M ovie" link, for
example, that leads into the heart of your s ite), you're
not giving G oogle's c rawler anything to work with. You're
als o making it diffic ult for s urfers who don't have Flas h
or are vis ually impaired.
Make sure your internal links work.
Sounds like a no- brainer, does n't it? M ake s ure your
internal page links work s o the G oogle c rawler c an get to
all your s ite's pages . You'll als o want to make s ure that
your vis itors c an navigate.
Check your title tags.
T here are few things s adder than getting a page of
s earc h res ults and finding "I ns ert Your T itle H ere" as
the title for s ome of them, although not quite as bad is
getting res ults for the s ame domain and s eeing the exact
s ame title tag over and over and over and over.
L ook. G oogle makes it pos s ible to s earc h jus t the title
tags in its index. Further, the title tags are eas y to read
on G oogle's s earc h res ults and are an eas y way for a
s urfer to quic kly get an idea of what a page is all about.
I f you're not making the mos t of your title tag, you're
mis s ing out on a lot of attention to your s ite.
T he perfec t title tag, to me, s ays s omething s pec ific
about the page it heads , and is readable to both s piders
and s urfers . T hat means you don't s tuff it with as many
keywords as you c an. M ake it a readable s entenc e,
ora nd I 've found this us eful for s ome pages make it a
ques tion.
Check your META tags.
G oogle s ometimes relies on META tags for a s ite
des c ription when there's a lot of navigation c ode that
wouldn't make s ens e to a human s earc her. I 'm not c razy
about META tags , but I 'd make s ure that at leas t the front
page of my web s ite had a des c ription and keyword META
tag s et, es pec ially if my s ite relied heavily on c ode-
bas ed navigation (like from J avaSc ript).
Check your A LT tags.
D o you us e a lot of graphic s on your pages ? D o you have
ALT tags for them s o that vis ually impaired s urfers and
the G oogle s pider c an figure out what thos e graphic s
are? I f you have a s plas h page with nothing but graphic s
on it, do you have ALT tags on all thos e graphic s s o that
a G oogle s pider c an get s ome idea of the c ontent? ALT
tags are perhaps the mos t neglec ted as pec t of a web
s ite. M ake s ure yours are s et up.
By the way, jus t bec aus e ALT tags are a good idea, don't
go c razy. You don't have to explain in your ALT tags that
a lis t bullet is a lis t bullet. You c an jus t mark it with an
as teris k.
Check your f rames.
I f you us e frames , you might be mis s ing out on s ome
indexing. G oogle rec ommends you that read D anny
Sullivan's artic le, "Searc h E ngines and Frames ," at
http://www.s earc henginewatc h.c om/webmas ters /frames .ht
Be s ure that G oogle c an either handle your frame s etup
or that you've c reated an alternative way for G oogle to
vis it, s uc h as us ing the NOFRAMES tag.
Consider your dynamic pages.
G oogle s ays they "limit the number and amount of
dynamic pages " they index. A re you us ing dynamic
pages ? D o you have to?
Consider how of ten you update your content.
T here is s ome evidenc e that G oogle indexes popular
pages with frequently updated c ontent more often. H ow
often do you update the c ontent on your front page?
Make sure you have a robots.txt f ile if you need one.
I f you want G oogle to index your s ite in a partic ular way,
make s ure you've got a robots .txt file for the G oogle
s pider to refer to. You c an learn more about robots .txt in
general at http://www.robots txt.org/wc /norobots .html.
If you don't want Google to cache your pages, you can
add a line to every page that you don't want cached.
A dd this line to the <HEAD> s ec tion of your page:
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
T his will tell all robots that arc hive c ontent, inc luding
engines like D aypop and G igablas t, not to c ac he your
page. I f you want to exc lude jus t the G oogle s pider from
c ac hing your page, us e this line:
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
Hack 91. Remove Your Materials from
Google
Remove your content f rom Google's various web properties.
Some people are more than thrilled to have G oogle index their
s ites . O ther folks don't want the G oogleBot anywhere near them.
I f you fall into the latter c ategory and the bot's already done its
wors t, there are s everal things you c an do to remove your
materials from G oogle's index. E ac h part of G oogleWeb Searc h,
G oogle I mages , and G oogle G roups h as its own s et of
methodologies .
8.12.1. Google Web Search
H ere are s everal tips to avoid being lis ted.
8.12.1.1 Making sure your pages never
get there to begin with
While you c an take s teps to remove your c ontent from the
G oogle index after the fac t, it's always muc h eas ier to make s ure
the c ontent is never found and indexed in the firs t plac e.
G oogle's c rawler obeys the robot exclus ion protocol, a s et of
ins truc tions you put on your web s ite that tells the c rawler how
to behave when it c omes to your c ontent. You c an implement
thes e ins truc tions in two ways : via a META tag that you put on
eac h page (handy when you want to res tric t ac c es s to only
c ertain pages or c ertain types of c ontent) or via a robots .txt file
that you ins ert in your root direc tory (handy when you want to
bloc k s ome s piders c ompletely or want to res tric t ac c es s to
kinds or direc tories of c ontent). You c an get more information
about the robots exc lus ion protoc ol and how to implement it at
http://www.robots txt.org/.
8.12.1.2 Removing your pages after
they're indexed
T here are s everal things you c an have removed from G oogle's
res ults .
T hes e ins truc tions are for keeping your
s ite out of G oogle's index only. For
information on keeping your s ite out of all
major s earc h engines , you'll have to work
with the robots exc lus ion protoc ol.
Removing the whole s ite
U s e the robots exc lus ion protoc ol, probably with
robots .txt.
Removing individual pages
U s e the following META tag in the HEAD s ec tion of eac h
page you want to remove:
<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">
Removing s nippets
A s nippet is the little exc erpt of a page that G oogle
dis plays on its s earc h res ult. To remove s nippets , us e
the following META tag in the HEAD s ec tion of eac h page for
whic h you want to prevent s nippets :
<META NAME="GOOGLEBOT" CONTENT="NOSNIPPET">
Removing cached pages
To prevent G oogle from keeping c ac hed vers ions of your
pages in its index, us e the following META tag in the HEAD
s ec tion of eac h page for whic h you want to prevent
c ac hing:
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">
8.12.1.3 Removing that content now
O nc e you implement thes e c hanges , the next time G oogleBot
c rawls your web s ite (us ually within a few weeks ), it will remove
or limit your c ontent ac c ording to your META tags and robots .txt
file. I f you want your materials removed right away, you c an us e
the automatic remover at
http://s ervic es .google.c om:8 8 8 2 /urlc ons ole/c ontroller. You'll
have to s ign in with an ac c ount (requires an email addres s and a
pas s word). U s ing the remover, you c an reques t that G oogle c rawl
your newly c reated robots .txt file, or you c an enter the U RL of a
page that c ontains exc lus ionary META tags .
M ake s ure that you have your exc lus ion
tags all s et up before you us e this s ervic e.
G oing to all the trouble of getting G oogle
to pay attention to a robots .txt file or
exc lus ion rules that you've not yet s et up
will s imply be a was te of your time.
8.12.1.4 Reporting pages with
inappropriate content
While you may like your own c ontent fine, you might find that,
even if you have filtering ac tivated, you're getting s earc h res ults
with explic it c ontent. O r you might find a s ite with a mis leading
title tag and c ontent c ompletely unrelated to your s earc h.
You have two options for reporting thes e s ites to G oogle. Bear in
mind that there's no guarantee that G oogle will remove the s ites
from the index, but they will inves tigate them. A t the bottom of
eac h page of s earc h res ults , you'll s ee a "D is s atis fied? H elp U s
I mprove" link; follow it to a form for reporting inappropriate
s ites . You c an als o s end the U RL of explic it s ites that s how up
on a SafeSearc h but probably s houldn't to
s afes earc h@ google.c om. I f you have more general c omplaints
about a s earc h res ult, you c an s end an email to s earc h-
quality@ google.c om.
8.12.2. Google Images
G oogle's I mage databas e of materials is s eparate from that of
the main s earc h index. To remove items from G oogle I mages ,
us e robots .txt to s pec ify that the G oogleBot I mage c rawler
s hould s tay away from your s ite. A dd thes e lines to your
robots .txt file:
User-agent: Googlebot-Image
Disallow: /
You c an us e the automatic remover mentioned in the web s earc h
s ec tion to have G oogle remove the images from its index
databas e quic kly.
T here may be c as es where s omeone has put images on their
s erver for whic h you own the c opyright. I n other words , you don't
have ac c es s to their s erver to add a robots .txt file, but you need
to s top G oogle from indexing your c ontent there. I n this c as e,
you need to c ontac t G oogle direc tly. G oogle has ins truc tions for
s ituations jus t like this at http://www.google.c om/remove.html;
look at O ption 2 , "I f you do not have any ac c es s to the s erver
that hos ts your image."
8.12.3. Google Groups
L ike the G oogle Web I ndex, you have the option to both prevent
material from being arc hived on G oogle and to remove it after
the fac t.
8.12.3.1 Preventing your material from
being archived
To prevent your material from being arc hived on G oogle, add the
following line to the headers of your U s enet pos ts :
X-No-Archive: yes
I f you do not have the options to edit the headers of your pos t,
make that line the firs t line in your pos t its elf.
8.12.3.2 Removing materials after the
fact
I f you want materials removed after the fac t, you have a c ouple
of options :
I f the materials that you want removed were pos ted
under an addres s to whic h you s till have ac c es s , you
c an us e the automatic removal tool mentioned earlier in
this hac k.
I f the materials that you want removed were pos ted
under an addres s to whic h you no longer have ac c es s ,
you'll need to s end an email to groups -
s upport@ google.c om with the following information:
Your full name and c ontac t information, inc luding
a verifiable email addres s .
T he c omplete G oogle G roups U RL or mes s age
I D for eac h mes s age you want removed.
A s tatement that s ays , "I s wear under penalty of
c ivil or c riminal laws that I am the pers on who
pos ted eac h of the foregoing mes s ages or am
authorized to reques t removal by the pers on who
pos ted thos e mes s ages ."
Your elec tronic s ignature.
8.12.4. Google Phonebook
You migt not want to have your c ontac t information made
available via the phonebook s earc hes on G oogle. You'll have to
follow one of two proc edures , depending on whether the lis ting
you want removed is for a bus ines s or for a res idential number.
I f you want to remove a bus ines s phone number, you'll need to
s end a reques t on your bus ines s letterhead to:
G oogle P honeBook Removal
1 6 0 0 A mphitheatre P arkway
M ountain V iew, C A 9 4 0 4 3
Be s ure to inc lude a phone number s o that G oogle c an reac h you
to verify your reques t.
Removing a res idential phone number is muc h s impler. Fill out
the form at http://www.google.c om/help/pbremoval.html. T he
form as ks for your name, c ity and s tate, phone number, email
addres s , and reas on for removal, a multiple c hoic e: inc orrec t
number, privac y is s ue, or "other."
Chapter 9. Programming
Google
H ac ks 9 2 - 1 0 0
Sec tion 9 .2 . Signing U p and G oogle's Terms
Sec tion 9 .3 . T he G oogle Web A P I s D eveloper's Kit
Sec tion 9 .4 . U s ing Your G oogle A P I Key
Sec tion 9 .5 . What's WSD L ?
Sec tion 9 .6 . U nders tanding the G oogle A P I Q uery
Sec tion 9 .7 . U nders tanding the G oogle A P I Res pons e
Sec tion 9 .8 . A N ote on Spidering and Sc raping
H ac k 9 2 . P rogram G oogle in P erl
H ac k 9 3 . I ns tall the SO A P ::L ite P erl M odule
H ac k 9 4 . P rogram G oogle with the N et::G oogle P erl
M odule
H ac k 9 5 . L oop A round the 1 0 - Res ult L imit
H ac k 9 6 . P rogram G oogle in P H P
H ac k 9 7 . P rogram G oogle in J ava
H ac k 9 8 . P rogram G oogle in P ython
H ac k 9 9 . P rogram G oogle in C # and .N E T
H ac k 1 0 0 . P rogram G oogle in V B.N E T
Hacks 92-100
When s earc h engines firs t appeared on the s c ene, they were
more open to being s pidered, s c raped, and aggregated. Sites like
E xc ite and A ltaV is ta didn't worry too muc h about the odd s urfer
us ing P erl to grab a s lic e of a page or meta- s earc h engines
inc luding their res ults in aggregated s earc h res ults . Sure,
egregious data s uc kers might get s hut out, but the s earc h
engines weren't worried about s haring their information on a
s maller s c ale.
G oogle never took that s tanc e. I ns tead, they have regularly
prohibited meta- s earc h engines from us ing their c ontent without
a lic ens e, and they try their bes t to bloc k unidentified web
agents like P erl's LWP::Simple module or even wget on the
c ommand line. G oogle has even been known to bloc k I P addres s
ranges for running automated queries .
G oogle had every right to do this ; after all, it was their s earc h
tec hnology, databas e, and c omputer power. U nfortunately,
however, thes e polic ies meant that c as ual res earc hers and
G oogle nuts , like you and I , didn't have the ability to play with
their ric h datas et in any automated way.
G oogle c hanged all that with the releas e of the G oogle Web A P I
(http://api.google.c om/) in the s pring of 2 0 0 2 . T he G oogle Web
A P I does n't allow you to do every kind of s earc h pos s iblefor
example, it does n't s upport the phonebook:s yntaxb ut it does
make available G oogle's eight- billion- page web databas e s o that
developers c an c reate their own interfac es and us e G oogle
s earc h res ults to their liking.
A P I s tands for "A pplic ation P rogramming
I nterfac e," a doorway for programmatic
ac c es s to a partic ular res ourc e or
applic ation, in this c as e, the G oogle index.
So how c an you partic ipate in all this G oogle A P I goodnes s ?
You have to regis ter for a developer's key, a login of s orts to the
G oogle A P I . E ac h key affords its owner 1 ,0 0 0 G oogle Web A P I
queries per day, after whic h you're out of luc k until the next day.
E ven if you don't plan on writing any applic ations , having a key
at your dis pos al is s till us eful. T here are various third- party
applic ations built on the G oogle A P I that you might want to vis it
and try out; s ome of thes e as k that you us e your own key and
allotted 1 ,0 0 0 queries .
9.2. Signing Up and Google's Terms
Signing up for a G oogle Web A P I developer's key is s imple. Firs t,
c reate a G oogle ac c ount. T he only requirements are a valid
email addres s and a made- up pas s word.
N ot only will you be able to us e the G oogle Web
A P I , but als o partic ipate in G oogle A ns wers
(http://ans wers .google.c om/ans wers /main),
volunteer to trans late G oogle into other
languages for the G oogle in Your L anguage
program
(https ://s ervic es .google.c om/tc /Welc ome.html),
and pos t to U s enet through G oogle G roups
[C hapter 4 ].
You will, of c ours e, have to agree to G oogle's Terms and
C onditions (http://www.google.c om/apis /download.html) before
you c an proc eed. I n broad s trokes , this s ays :
G oogle exerc is es no editorial c ontrol over the s ites that
appear in its index. T he G oogle A P I might return s ome
res ults that you might find offens ive.
T he G oogle A P I may be us ed for pers onal, non-
c ommerc ial us e only. I t may not be us ed to s ell a
produc t or s ervic e or to drive traffic to a s ite for the s ake
of advertis ing s ales .
You c an't noodle with G oogle's intellec tual property
marks that appear within the A P I .
G oogle does not ac c ept any liability for the us e of their
A P I . T his is a beta program.
You may indic ate that the program you c reate us es the
G oogle A P I , but not if the applic ation(s ) "(1 ) tarnis h,
infringe, or dilute G oogle's trademarks , (2 ) violate any
applic able law, and (3 ) infringe any third- party rights ."
A ny other us e of G oogle's trademark or logo requires
written c ons ent.
O nc e you've entered your email addres s , c reated a pas s word,
and agreed to the Terms of Servic e, G oogle s ends you an email
mes s age to c onfirm the legitimac y of your email addres s . T he
mes s age inc ludes a link for final ac tivation of the ac c ount. C lic k
the link to ac tivate your ac c ount, and G oogle will email you your
very own lic ens e key.
You have s igned in and generated a key; you're all s et! What
now? I f you don't intend to do any programming, jus t s top here.
P ut your key in a s afe plac e and keep it on hand to us e with any
c ool third- party G oogle A P I - bas ed s ervic es that you c ome
ac ros s .
9.3. The Google Web APIs Developer's
Kit
I f you are interes ted in doing s ome programming, download the
G oogle Web A P I s D eveloper's Kit
(http://www.google.c om/apis /download.html). While not s tric tly
nec es s ary to any G oogle A P I programming that you might do,
the kit c ontains muc h that is us eful:
A c ros s - platform WSD L file (s ee below)
A J ava wrapper library abs trac ting away s ome of the
SO A P plumbing
A s ample .N E T applic ation
D oc umentation, inc luding J avaD oc and SO A P XM L
s amples
Simply c lic k the download link, unzip the file, and take a look at
the README.txt file to get underway.
9.4. Using Your Google API Key
E very time you s end a reques t to the G oogle s erver in a
program, you have to s end your key along with it. G oogle c hec ks
the key and determines whether it's valid and you're s till within
your daily 1 ,0 0 0 query limit; if s o, G oogle proc es s es the
reques t.
A ll the programs in this book, regardles s of language and
platform, provide a plac e to plug in your key. T he key its elf is
jus t a s tring of random- looking c harac ters (e.g.,
12BuCK13mY5h0E/34KN0cK@ttH3Do0R).
I f you're going to be making your hac k available online
for others to us e, you might well c ons ider as king
vis itors to s ign up for and us e their own G oogle A P I
keya t leas t optionally. A thous and queries per day
really is n't that muc h, and s hould your hac k bec ome
popular, you'll more than likely have a few unhappy
vis itors for whom it jus t does n't work when you've us e
up your quota. You c an s ee an example of this in ac tio
on Tara's G oogleJ ac k! P age
(http://www.res earc hbuzz.org/arc hives /0 0 1 4 1 8 .s html
notic e the s pot in the G oogleJ ac k! form for G oogle A P
Key.
A P erl hac k us ually inc ludes a line like the following:
...
# Your Google API developer's key.
my $google_key='insert key here';
...
T he J ava GoogleAPI Demo inc luded in the G oogle Web A P I s
D eveloper's Kit is invoked on the c ommand line, like s o:
% java -cp googleapi.jar com.google.soap.search.GoogleAPIDemo
insert_key_here search ostrich
I n both c as es , insert key here or insert_key_here s hould be
s ubs tituted with your own G oogle Web A P I key. For example, I
would plug my made- up key into the P erl s c ript as follows :
...
# Your Google API developer's key.
my $google_key='
12BuCK13mY5h0E/34KN0cK@ttH3Do0R
';
...
9.5. What's WSDL?
P ronounc ed "whiz- dill," WSD L s tands for Web Servic es
D es c ription L anguage, an XM L format for des c ribing web
s ervic es . T he mos t us eful bit of the G oogle Web A P I D eveloper's
Kit is GoogleSearch.ws dl, a WSD L file des c ribing the G oogle
A P I 's available s ervic es , method names , and expec ted
arguments to your programming language of c hoic e.
M os t of the hac ks in this book as s ume that the
GoogleSearch.ws dl file is in the s ame direc tory as the s c ripts that
you're writing s inc e this is probably the s imples t s etup. I f you
prefer to keep it els ewhere, be s ure to alter the path in the s c ript
at hand. A P erl hac k us ually s pec ifies the loc ation of the WSD L
file, like s o:
...
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
...
I like to keep s uc h files together in a library direc tory and s o
would make the following adjus tment to the previous c ode
s nippet:
...
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "/home/me/lib/GoogleSearch.wsdl";
...
9.6. Understanding the Google API
Query
T he c ore of a G oogle applic ation is the query. Without the query,
there's no G oogle data, and without that, you don't have muc h of
an applic ation. Bec aus e of its importanc e, it's worth taking a
little time to look into the anatomy of a typic al query.
9.6.1. Query Essentials
T he c ommand in a typic al P erl- bas ed G oogle A P I applic ation
that s ends a query to G oogle looks like this :
my $results = $google_search ->
doGoogleSearch(
key, query, start, maxResults,
filter, restrict, safeSearch, lr,
ie, oe
);
U s ually, the items within the parenthes es are variables ,
numbers , or Boolean values (true or false). I n the previous
example, I 've inc luded the names of the arguments thems elves
rather than s ample values s o that you c an s ee their definitions
here:
key
T his is where you put your G oogle A P I developer's key.
Without a key, the query won't go very far.
query
T his is your query, c ompos ed of keywords , phras es , and
s pec ial s yntaxes .
start
A ls o known as the offs et, this integer value s pec ifies at
what res ult to s tart c ounting when determining whic h 1 0
res ults to return. I f this number were 16, the G oogle A P I
would return res ults 1 6 - 2 5 ; if 300, res ults 3 0 0 - 3 0 9
(as s uming, of c ours e, that your query found that many
res ults ). T his is known as a zero-bas ed index, s inc e
c ounting s tarts at 0 , not 1 . T he firs t res ult is res ult 0 ,
and the 9 9 9 th, 9 9 8 . I t's a little odd, admittedly, but you
get us ed to it quic klye s pec ially if you go on to do a lot of
programming. A c c eptable values are 0 to 999 bec aus e
G oogle only returns up to a thous and res ults for a query.
maxResults
T his integer s pec ifies the number of res ults that you
would like the A P I to return. T he A P I returns res ults in
batc hes of up to ten, s o ac c eptable values are 1 through
10.
filter
You might think that the filter option c onc erns the
SafeSearc h filter for adult c ontent. I t does n't. T his
Boolean value (true or false) s pec ifies whether your
res ults go through automatic query filtering, removing
near- duplic ate c ontent (titles and s nippets that are very
s imilar) and multiple (more than two) res ults from the
s ame hos t or s ite. With filtering enabled, only the firs t
two res ults from eac h hos t are inc luded in the res ult s et.
restrict
N o, res trict does n't have anything to do with SafeSearc h
either. I t allows for res tric ting your s earc h to one of
G oogle's topic al s earc hes or to a s pec ific c ountry.
G oogle has four topic res tric ts : U .S. G overnment
(unclesam), L inux (linux), M ac intos h (mac), and FreeBSD
(bsd). You'll find the c omplete c ountry lis t in the G oogle
Web A P I doc umentation. To leave your s earc h
unres tric ted, leave this option blank (us ually s ignified by
empty quotation marks , "").
safeSearch
N ow here's the SafeSearc h filtering option. T his Boolean
(true or false) s pec ifies whether res ults returned will be
filtered for ques tionable (read: adult) c ontent.
lr
T his s tands for language res trict and it's a bit tric ky.
G oogle has a lis t of languages in its A P I doc umentation
to whic h you c an res tric t s earc h res ults , or you c an
s imply leave this option blank and have no language
res tric tions .
T here are s everal ways that you c an res tric t to
language. Firs t, you c an s imply inc lude a language c ode.
I f you want to res tric t res ults to E nglis h, for example,
us e lang_en. But you c an als o res tric t res ults to more
than one language, s eparating eac h language c ode with
a | (pipe), s ignifying OR. lang_en|lang_de, then, c ons trains
res ults to only thos e "in E nglis h or G erman."
You c an omit languages from res ults by prepending them
with a - (minus s ign). -lang_en returns all res ults but
thos e in E nglis h.
ie
T his s tands for input encoding, allowing you to s pec ify
the c harac ter enc oding us ed in the query that you're
feeding the A P I . G oogle's doc umentation s ays , "C lients
s hould enc ode all reques t data in U T F- 8 and s hould
expec t res ults to be in U T F- 8 ." I n the firs t iteration of
G oogle's A P I program, the G oogle A P I doc umentation
offered a table of enc oding options (latin1, cyrillic, etc .)
but now everything is UTF-8. I n fac t, s pec ifying anything
other than U T F- 8 is s ummarily ignored.
oe
T his s tands for output encoding. A s with input enc oding,
everything's UTF-8.
9.6.2. A Sample
E nough with the plac eholders ; what does an ac tual query look
like?
Take, for example, a query that us es variables for the key and
the query, reques ts 1 0 res ults s tarting at res ult number 1 0 0
(ac tually the 1 0 1 s t res ult), and s pec ifies filtering and
SafeSearc h be turned on. T hat query in P erl would look like this :
my $results = $google_search ->
doGoogleSearch(
$google_key, $query, 100, 10,
"true", "", "true", "",
"utf8", "utf8"
);
N ote that the key and query c ould jus t as eas ily have been
pas s ed along as quote- delimited s trings :
my $results = $google_search ->
doGoogleSearch(
"12BuCK13mY5h0E/34KN0cK@ttH3Do0R", "+paloentology +dentistry"
"true", "", "true", "",
"utf8", "utf8"
);
While things appear a little more c omplex when you s tart fiddling
with the language and topic res tric tions , the c ore query remains
mos tly unc hanged; only the values of the options c hange.
9.6.3. Intersecting Country, Language, and
Topic Restrictions
Sometimes you might want to res tric t your res ults to a partic ular
language in a partic ular c ountry, or a partic ular language,
partic ular c ountry, and partic ular topic . N ow here's where things
s tart looking a little on the odd s ide.
T he rules are as follows :
O mit s omething by prepending it with a - (minus s ign).
Separate res tric tions with a . (period, or full s top);
s pac es are not allowed.
Spec ify an OR relations hip between two res tric tions with
a | (pipe).
G roup res tric tions with parenthes es . You c an have
parenthes es within parenthes es n es ted parenthes es for
fine- grained c ontrol over grouping in your queries .
L et's s ay you want a query to return res ults in Frenc h, draw only
from C anadian s ites , and foc us only within the L inux topic . Your
query would look s omething like this :
my $results = $google_search ->
doGoogleSearch(
$google_key, $query, 100, 10,
"true", "linux.countryCA", "true", "lang_fr",
"utf8", "utf8"
);
For res ults from C anada or from Franc e, you would us e:
"linux.(countryCA|countryFR)"
O r maybe you want res ults in Frenc h, but from anywhere but
Franc e:
"linux.(-countryFR)"
For a c omprehens ive lis t of res tric ts , s ee Sec tion 2 .4 ,
"Res tric ts ," of API s _Reference.html, part of the G oogle A P I
doc umentation
9.6.4. Putting Query Elements to Use
You might us e the different elements of the query as follows :
Us ing SafeSearch
I f you're building a program that's for family- friendly
us e, you'll probably want to have SafeSearc h turned on
as a matter of c ours e. But you c an als o us e it to
c ompare s afe and uns afe res ults . [Hack #35] does jus t
that. You c ould c reate a program that takes a word from
a web form and c hec ks its c ounts in filtered and
unfiltered s earc hes , providing a naughty rating for the
word bas ed on the c ounts .
Setting s earch res ult numbers
Whether you reques t 1 or 1 0 res ults , you're s till us ing
one of your developer key's daily dos e of a thous and
G oogle Web A P I queries . Wouldn't it then make s ens e to
always reques t 1 0 ? N ot nec es s arily; if you're us ing only
the top res ultt o bounc e the brows er to another page,
generate a random query s tring for a pas s word, or
whatevery ou might as well add even the minutes t
amount of s peed to your applic ation by not reques ting
res ults that you're jus t going to throw out or ignore.
Searching different topics
With four different s pec ialty topic s available for
s earc hing through the G oogle A P I , dozens of different
languages , and dozens of different c ountries , there are
thous ands of c ombinations of topic /language/c ountry
res tric tion that you would work through.
C ons ider an open s ource country applic ation. You c ould
c reate a lis t of keywords very s pec ific to open s ourc e
(s uc h as linux, perl, etc .) and c reate a program that
c yc les through a s eries of queries that res tric ts your
s earc h to an open s ourc e topic (s uc h as linux) and a
partic ular c ountry. So you might dis c over that perl was
mentioned in Franc e in the linux topic 1 5 times , in
G ermany 2 0 times , and s o on.
You c ould als o c onc entrate les s on the program its elf
and more on an interfac e to ac c es s thes e variables . H ow
about a form with pull- down menus that allows you to
res tric t your s earc hes by c ontinent (ins tead of c ountry)?
You c ould s pec ify whic h c ontinent in a variable that's
pas s ed to the query. O r how about an interfac e that lets
the us er s pec ify a topic and c yc les through a lis t of
c ountries and languages , pulling res ult c ounts for eac h
one?
9.7. Understanding the Google API
Response
While the G oogle A P I grants you programmatic ac c es s to
G oogle's Web index, it does n't provide all the func tionality
available through the G oogle.c om web s ite's s earc h interfac e.
9.7.1. Can Do
T he G oogle A P I , in addition to s imple keyword queries , s upports
the following ["Spec ial Syntaxes " in C hapter 1 ]:
site:
daterange:
intitle:
inurl:
allintext:
allinlinks:
filetype:
info:
link:
related:
cache:
9.7.2. Can't Do
T he G oogle A P I does not s upport thes e s pec ial s yntaxes :
phonebook:
rphonebook:
bphonebook:
stocks:
While queries of this s ort provide no individual res ults ,
aggregate res ult data is s ometimes returned and c an prove
rather us eful. googly.php [Hack #96], for ins tanc e, dis plays the
number of res ults (estimatedTotalResultsCount).
9.7.3. The 10-Result Limit
While s earc hes through the s tandard G oogle.c om home page c an
be tuned ["Setting P referenc es " in C hapter 1 ] to return 1 0 , 2 0 ,
3 0 , 5 0 , or 1 0 0 res ults per page, the G oogle Web A P I limits the
number to 1 0 per query. T his does n't mean, mind you, that the
res t are not available to you, but it takes a wee bit of c reative
programming entailing looping through res ults , 1 0 at a time
[Hack #95] .
9.7.4. What's in the Results
T he G oogle A P I provides both aggregate and per- res ult data in
its res ult s et.
9.7.4.1 Aggregate data
T he aggregate data, information on the query its elf and on the
kinds and number of res ults that query turned up, c ons is ts of:
<documentFiltering>
A Boolean (true/false) value s pec ifying whether or not
res ults were filtered for very s imilar res ults or thos e that
c ome from the s ame web hos t.
<searchComments>
A ny c ommentary (e.g., a note about s top words being
removed) G oogle might throw in that would us ually be
dis played jus t beneath the s earc h box on a typic al
G oogle res ults page.
<estimatedTotalResultsCount>
A n es timate of how many res ults might be found for your
s earc h in the G oogle index. T his number may vary from
invoc ation to invoc ation, moment to momentt hus the
"es timated" provis o.
<estimateIsExact>
G oogle may s ometimes be s ure of its
estimatedTotalResultsCount, in whic h c as e estimateIsExact
will be s et to TRue.
<resultElements>
T he individual res ults thems elves , returned as an array.
<searchQuery>
Your G oogle query, right bac k at you.
<startIndex>
T he index of the firs t res ult in the c urrent array of
res ults . A s s uming your query as ked for a start of 0, the
firs t res ult will have a startIndex of 1. I f you as ked for a
start of 25, startIndex would be 26. Yes , I know it's
c onfus ing that start is zero- bas ed, while startIndex is
one- bas ed, but that's the way the c ookie c rumbles , I 'm
afraid.
<endIndex>
T he index of the las t res ult in the c urrent array of
res ults . T his is always whatever you s et as start +
maxResults in your query, unles s the total is greater than
the number of estimatedTotalResultsCount, in whic h c as e
it is s imply estimatedTotalResultsCount.
<searchTips>
M ay provide s ugges tions on better us ing G oogle,
s uitable for dis playing to the end us er.
<directoryCategories>
A lis t of direc tory c ategories , if any, as s oc iated with the
query
<searchTime>
T he time s pent by the G oogle s erver (in s ec onds ) on
your s earc h.
9.7.4.2 Individual search result data
T he "guts " of a s earc h res ultt he U RL s , page titles , and
s nippets a re returned in a <resultElements> lis t. E ac h res ult
c ons is ts of the following elements :
<summary>
T he G oogle D irec tory s ummary, if available.
<URL>
T he s earc h res ult's U RL , c ons is tently s tarts with
http://.
<snippet>
A brief exc erpt of the page with query terms highlighted
in bold (H T M L <b> </b> tags ).
<title>
T he page title in H T M L .
<cachedSize>
T he s ize in kilobytes (K) of the G oogle- c ac hed vers ion of
the page, if available.
<relatedInformationPresent>
I f s et to 1, means a related: s earc h on the c urrent
res ult's U RL will turn up s omething of us e.
<hostName>
When you s et filter to TRue in your query, only two
res ults from the s ame hos tname are inc luded in your s et
of res ults . I n the s ec ond of thes e res ults , hostName is s et
to the hos t from whic h the res ult c ame.
<directoryTitle>
T he title under whic h this res ult appears in the G oogle
D irec tory (http://direc tory.google.c om, a.k.a. the O pen
D irec tory P rojec t) if it is in the direc tory at all.
<directoryCategory>
T he G oogle D irec tory c ategory, if any, in whic h you'll find
this res ult. <directoryCategory> c ons is ts of
<fullViewableName>, the name given to the c ategory its elf,
and <specialEncoding>, any s pec ial enc oding as s igned to
the direc tory c ategory at hand.
You no doubt notic e the c ons pic uous abs enc e of P ageRank.
G oogle does not make P ageRank available through anything but
the offic ial G oogle Toolbar [Hack #60] . You c an get a general
idea of a page's popularity by looking over the popularity bars in
the G oogle D irec tory.
9.8. A Note on Spidering and Scraping
Some s mall s hare of the hac ks in this book involve s pidering, or
meandering through s ites and s c raping data from their web
pages to be us ed outs ide of their intended c ontext. G iven that
we have the G oogle A P I at our dis pos al, why then do we res ort
at times to s pidering and s c raping?
T he main reas on is s imply that you c an't gain ac c es s to
everything G oogle through the A P I . While it nic ely s erves the
purpos es of s earc hing the Web programmatic ally, the A P I (at the
time of this writing) does n't go any further than G oogle's main
web s earc h index. A nd it's even limited in what you c an pull from
the index. You c an't do a phonebook s earc h, trawl G oogle N ews ,
leaf through G oogle C atalogs , or interac t in any way with any of
G oogle's other s pec ialty s earc h properties .
So, while G oogle provides a good s tart in its A P I , there are more
often than not s ituations in whic h you c an't get to the G oogle
data that you're mos t interes ted in. N ot to mention c ombining
what you c an get through the G oogle A P I with data from other
s ites without s uc h a c onvenienc e. T hat's where s pidering and
s c raping c omes in.
T hat s aid, there are a few things that you need to keep in mind
when res orting to s c raping:
Scrapers are brittle
T he s helf life of a s c raper is only as long as the page it
is s c raping remains formatted in about the s ame manner.
When the page c hanges , your s c raper c ana nd mos t
likely willb reak.
Tread lightly
Tread lightly, taking only as muc h as you need and no
more. I f all you need is the data from the page that you
already have open in your brows er, s ave the s ourc e and
s c rape that.
Maximize your effectivenes s
M ake the mos t out of every page you s c rape. Rather
than hitting G oogle again and again for the next 1 0
res ults and the next 1 0 , s et your preferenc es ["Setting
P referenc es " in C hapter 1 ] s o that you get all you c an
on a s ingle page. For ins tanc e, s et your preferred
number of res ults to 1 0 0 rather than the default 1 0 .
Mind the terms of s ervice
I t might be tempting to go one s tep further and c reate
programs that automate retrieving and s c raping, but
you're more likely to tread on the toes of the s ite owner
(G oogle or otherwis e) and be as ked to leave or s imply
loc ked out.
So us e the A P I whenever you c an, s c rape only when you
abs olutely mus t, and mind your p's and q's when fiddling about
with other people's data.
Hack 92. Program Google in Perl
This simple script illustrates the basics of programming the
Google Web A PI with Perl and lays the groundwork f or the
lion's share of hacks to come.
T he vas t majority of hac ks in this book are written in P erl. While
the s pec ific s vary from hac k to hac k, muc h of the bus y work of
querying the G oogle A P I and looping over the res ults remain
es s entially the s ame. T his hac k is utterly bas ic , providing a
foundation on whic h to build more c omplex and interes ting
applic ations . I f you haven't done anything of this s ort before,
this hac k is a good s tarting point for experimentation. I t s imply
s ubmits a query to G oogle and prints out the res ults .
9.9.1. The Code
Type the following c ode into your preferred plain- text editorb e it
N otepad, TextE dit, or a c ommand- line editor like vi or E mac s a nd
s ave it to a file named googly.pl. Remember to replac e insert key
here with your G oogle A P I key, as explained in "U s ing Your
G oogle A P I Key" earlier in this c hapter.
I n addition to the G oogle A P I D eveloper's
Kit, you'll need the SO A P ::L ite P erl
module ins talled [Hack #93] before
running this hac k.
#!/usr/local/bin/perl
# googly.pl
# A typical Google Web API Perl script.
# Usage: perl googly.pl <query>
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
# Use the SOAP::Lite Perl module.
use SOAP::Lite;
# Take the query from the command line.
my $query = shift @ARGV or die "Usage: perl googly.pl <query>\n";
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Query Google.
my $results = $google_search ->
doGoogleSearch(
$google_key, $query, 0, 10, "false", "", "false",
"", "latin1", "latin1"
);
# No results?
@{$results->{resultElements}} or exit;
# Loop through the results.
foreach my $result (@{$results->{resultElements}}) {
# Print out the main bits of each result
print
join "\n",
$result->{title} || "no title",
$result->{URL},
$result->{snippet} || 'no snippet',
"\n";
9.9.2. Running the Hack
Run this s c ript from the c ommand line ["H ow to Run the H ac ks "
in the P refac e], pas s ing it any G oogle s earc h that you want to
run like s o:
$ perl googly.pl "query keywords"
9.9.3. The Results
H ere's a s ample run. T he firs t attempt does n't s pec ify a query
and s o triggers a us age mes s age and does n't go any further. T he
s ec ond s earc hes for learning perl and prints out the res ults .
% perl googly.pl
Usage: perl googly.pl <query>
% perl googly.pl "learning perl"
oreilly.com -- Online Catalog: Learning
Perl, 3rd Edition
http://www.oreilly.com/catalog/lperl3/
... learning perl, 3rd Edition Making Easy Things Easy and Hard Th
Possible By Randal L. Schwartz, Tom Phoenix 3rd Edition July
2001 0-596-00132-0
...
Amazon.com: buying info: learning perl (2nd Edition)
http://www.amazon.com/exec/obidos/ASIN/1565922840
... learning perl takes common programming idioms and expresses th
in "perlish"<br> terms. ... (learning perl,
Programming Perl, Perl Cookbook).
Hack 93. Install the SOAP::Lite Perl
Module
Install the SOA P::Lite Perl module, backbone of the vast
majority of hacks in this book.
T he SOAP::Lite (http://www.s oaplite.c om) P erl module is the de
fac to s tandard for interfac ing with SO A P - bas ed web s ervic es
from P erl. A s s uc h, it is us ed extens ively throughout this book
and in hac ks that you might s tumble ac ros s online.
While teac hing you how to ins tall P erl modules is beyond the
s c ope of this book, we've inc luded thes e ins truc tions to
boots trap your G oogle hac king without need of wandering off in
s earc h of a P erl book.
I t's unfortunately rather c ommon for I nternet s ervic e providers
(I SP s ) not to make SOAP::Lite available to their us ers . I n many
c as es , I SP s are rather res tric tive in general about what modules
they make available and s c ripts they allow us ers to exec ute.
O thers are more ac c ommodating and more than willing to ins tall
P erl modules on reques t. Before taking up your time and
brainpower ins talling SOAP::Lite yours elf, as k your s ervic e
provider if it's already there or if it c an be ins talled for you.
P robably the eas ies t way to ins tall SOAP::Lite is via another P erl
module, C P A N , inc luded with jus t about every modern P erl
dis tribution. T he C P A N module automates the ins tallation of
P erl modules , fetc hing c omponents and any prerequis ites from
the C omprehens ive P erl A rc hive N etwork (thus the name,
C P A N ) and building the whole kit- and- kaboodle on the fly.
C P A N ins talls modules into s tandard
s ys tem- wide loc ations and, therefore,
as s umes you're running as the root us er. I f
you have no more than regular us er
ac c es s , you'll have to ins tall SO A P ::L ite
and its prerequis ites by hand ["U nix
I ns tallaion by H and" in the next s ec tion].
9.10.1. Unix and Mac OS X Installation via
CPAN
A s s uming you have the C P A N module, have root ac c es s , and
are c onnec ted to the I nternet, ins tallation s hould be no more
c omplic ated than:
% su
Password:
# perl -MCPAN -e shell
cpan shell -- CPAN exploration and modules installation (v1.52)
ReadLine support available (try ``install Bundle::CPAN'')
cpan> install SOAP::Lite
O r, if you prefer one- liners :
% sudo perl -MCPAN -e 'install SOAP::Lite'
I n either c as e, go grab yours elf a c up of c offee, meander the
garden, read the paper, and c hec k bac k onc e in a while. Your
terminal's s ure to be riddled with inc omprehens ible
gobbledygook that you c an, for the mos t part, s ummarily ignore.
You may be as ked a ques tion or three; in mos t c as es , s imply
hitting return to ac c ept the default ans wer will do the tric k.
9.10.2. Unix Installation by Hand
I f C P A N ins tallation didn't quite work as expec ted, you c an of
c ours e ins tall SOAP::Lite by hand. D ownload the lates t vers ion
from SO A P L ite.c om (http://www.s oaplite.c om/), unpac k, and
build it like s o:
% tar xvzf SOAP-Lite-latest.tar.gz
SOAP-Lite-0.55
SOAP-Lite-0.55/Changes
...
SOAP-Lite-0.55/t/37-mod_xmlrpc.t
SOAP-Lite-0.55/t/TEST.pl
% cd SOAP-Lite-
0.XX
% perl Makefile.PL
We are about to install SOAP::Lite and for your convenience will
provide you with list of modules and prerequisites, so you'll be a
to choose only modules you need for your configuration.
XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by de
Installed transports can be used for both SOAP::Lite and XMLRPC::L
Client HTTP support (SOAP::Transport::HTTP::Client) [yes]
Client HTTPS support (SOAP::Transport::HTTPS::Client... [no]
...
SSL support for TCP transport (SOAP::Transport::TCP) [no]
Compression support for HTTP transport (SOAP::Transport... [no]
Do you want to proceed with this configuration? [yes]
During "make test" phase we may run tests with several SOAP server
that may take long and may fail due to server/connectivity problem
Do you want to perform these tests in addition to core tests? [no]
Checking if your kit is complete...
Looks good
...
% make
mkdir blib
mkdir blib/lib
...
% make test
PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib
-I/System/Library/Perl/darwin -I/System/Library/Perl -e 'use
Test::Harness qw(&runtests $verbose); $verbose=0; runtests @ARGV;'
t/01-core.t t/02-payload.t t/03-server.t t/04-attach.t t/05-custom
t/06-modules.t t/07-xmlrpc_payload.t t/08-schema.t t/01-core......
...
# sudo make install
Password:
Installing /Library/Perl/XMLRPC/Lite.pm
Installing /Library/Perl/XMLRPC/Test.pm
...
I f, during the perl Makefile.PL phas e, you run into any warnings
about ins talling prerequis ites , ins tall eac h in turn before
attempting to ins tall SOAP::Lite again. A typic al prerequis ite
warning looks s omething like this :
Checking if your kit is complete...
Looks good
Warning:
prerequisite HTTP::Daemon
failed to load: Can't locate
HTTP/Daemon.pm in @INC (@INC contains: /System/Library/Perl/darwin
/System/Library/Perl /Library/Perl/darwin /Library/Perl /Library/P
/Network/Library/Perl/darwin /Network/Library/Perl
/Network/Library/Perl .) at (eval 8) line 3.
I f you have little more than us er ac c es s to the s ys tem and s till
ins is t on ins talling SOAP::Lite yours elf, you'll have to ins tall it
and all its prerequis ites s omewhere in your home direc tory. ~/lib,
a lib direc tory in your home direc tory, is as good a plac e as any.
I nform P erl of your preferenc e like s o:
% perl Makefile.PL LIB=
/home/login/lib
Replac e /home/login/lib with an appropriate path.
9.10.3. Windows Installation via PPM
I f you're running P erl under Windows , c hanc es are its
A c tiveState's A c tiveP erl
(http://www.ac tives tate.c om/P roduc ts /A c tiveP erl/). T hankfully,
A c tiveP erl's outfitted with a C P A N - like module ins tallation
utility. T he P rogrammer's P ac kage M anager (P P M ,
http://as pn.ac tives tate.c om/A SP N /D ownloads /A c tiveP erl/P P M /)
grabs nic ely pac kaged module bundles from the A c tiveState
arc hive and drops them into plac e on your Windows s ys tem with
little need of help from you.
Simply launc h P P M from ins ide a D O S terminal window and tell it
to ins tall the SOAP::Lite bundle.
C:\>ppm
PPM interactive shell (2.1.6) - type 'help' for available commands
PPM> install SOAP::Lite
I f you're running a reas onably rec ent build, you're probably in for
a pleas ant s urpris e:
C:\>ppm
PPM interactive shell (2.1.6) - type 'help' for available commands
PPM> install SOAP::Lite
Version 0.55 of 'SOAP-Lite' is already installed.
9.10.4. A Note About Expat
T here's a little s omething c alled E xpat
(http://expat.s ourc eforge.net) that, more often than not, is the
one hic c up in the ins tallation proc es s p artic ularly when ins talling
us ing the C P A N module or by hand. E xpat is an XM L pars er
library written in the C programming language and underlying
many of the XM L modules that you might us e. Fortunately, you'll
probably find it ins talled by default on the s ys tem you're us ing,
but if it is n't there, you won't get very far.
T he eas ies t way to ins tall E xpat under M ac O S X or U nix/L inux
goes a little s omething like this :
$ curl -O http://easynews.dl.sourceforge.net/sourceforge/expat/exp
.tar.gz
$ tar -xvzf xpat- X.XX.X
.tar.gz
...
$ cd expat-
X.XX.X
$ ./configure
$ make
$ sudo make install
Hack 94. Program Google with the
Net::Google Perl Module
A crisp, clean, object-oriented alternative to programming
Google with Perl and the SOA P::Lite module.
A n alternative, more objec t- oriented P erl interfac e to the G oogle
A P I is A aron Straup C ope's Net::Google
(http://s earc h.c pan.org/s earc h? query=net+
google&mode=module). While not fundamentally different from
us ing SO A P ::L ite [Hack #93] as we do throughout this book,
c ons truc ting G oogle A P I queries and dealing with the res ults is
a little c leaner.
T here are three main G oogle A P I interfac es defined by the
module: search(), spelling(), and cache( ) for talking to the
G oogle Web s earc h engine, s pellc hec ker, and G oogle c ac he,
res pec tively.
To provide a s ide- by- s ide c omparis on to googly.pl [Hack #92],
the typic al SOAP::Lite- bas ed way to talk to the G oogle A P I ,
we've provided a s c ript identic al in func tion and almos t s o in
s truc ture.
9.11.1. The Code
Save the following s c ript as net_googly.pl. Replac e insert key
here with your G oogle A P I key as you type in the c ode.
M ind you, you'll s till need SOAP::Lite and a
c ouple of other prerequis ites to us e
Net::Google.
#!/usr/local/bin/perl
# net_googly.pl
# A typical Google API script using the Net::Google Perl module.
# Usage: perl net_googly.pl <query>
use strict;
# Use the Net::Google Perl module.
use Net::Google;
# Your Google API developer's key.
use constant GOOGLE_API_KEY => 'insert key here';
# Take the query from the command line.
my $query = shift @ARGV or die "Usage: perl net_googly.pl <query>\
# Create a new Net::Google instance.
my $google = Net::Google->new(key => GOOGLE_API_KEY);
# And create a new Net::Google search instance.
my $search = $google->search( );
# Build a Google query.
$search->query($query);
$search->starts_at(0);
$search->max_results(10);
$search->filter(0);
# Query Google.
$search->results( );
# Loop through the results.
foreach my $result ( @{$search->results( )} ) {
# Print out the main bits of each result.
print
join "\n",
$result->title( ) || "no title",
$result->URL( ),
$result->snippet( ) || 'no snippet',
"\n";
N otic e that the c ode is all but identic al to that of googly.pl [Hack
#92] . T he only real c hanges (c alled out in bold) are c leaner
objec t- oriented method c alls for s etting query parameters and
dealing with the res ults . So, rather than pas s ing a s et of
parameters to a SOAP::Lite s ervic e c all like this :
doGoogleSearch(
$google_key, $query, 0, 10, "false", "", "false",
"", "latin1", "latin1"
);
Set thes e parameters individually like this :
$search->query($query);
$search->starts_at(0);
$search->max_results(10);
$search->filter(0);
N ot muc h differenc e, but definitely c leaner.
9.11.2. Running the Hack
I nvoke the hac k on the c ommand line in jus t the s ame manner
you did in [Hack #92] :
$ perl net_googly.pl "query keywords"
T he res ults will be jus t the s ame.
Hack 95. Loop Around the 10-Result
Limit
If you want more than 10 results, you'll have to loop.
T he G oogle A P I returns only 1 0 res ults per query, plenty for
s ome queries , but for mos t applic ations , 1 0 res ults barely
s c ratc hes the s urfac e. I f you want more than 1 0 res ults , you're
going to have to loop, querying for the next s et of 1 0 eac h time.
T he firs t query returns the top 1 0 . T he next, 1 1 through 2 0 . A nd
s o forth.
T his hac k builds on the bas ic P erl s c ript googly.pl [Hack #92]
that we s howed you in the previous hac k. To get at more than the
top 1 0 res ults , no matter the programming language you're
us ing, you'll have to c reate a loop.
Bear in mind that eac h and every query
c ounts agains t your daily allotment. L oop
three times and you've us ed up three
queries . Ten, and you're down 1 0 . While
this does n't s eem like muc h given your
quota of 1 ,0 0 0 queries a day, you'd be
s urpris ed how quic kly you c an reac h the
bottom of the c ookie jar without knowing
where they all went.
9.12.1. The Code
Save the following c ode to a text file named looply.pl. A gain,
remember to replac e insert key here with your G oogle A P I key,
as explained in "U s ing Your G oogle A P I Key" earlier in this
c hapter.
I n addition to the G oogle A P I D eveloper's
Kit, you'll need the SO A P ::L ite P erl
module [Hack #93] ins talled before
running this hac k.
T he alterations to the previous hac k needed to s upport looping
through more than the firs t 1 0 res ults are c alled out in bold.
#!/usr/local/bin/perl
# looply.pl
# A typical Google Web API Perl script.
# Usage: perl looply.pl <query>
# Your Google API developer's key.
my $google_key='insert key here ';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 3; # 3 loops x 10 results per loop = top 30 results
use strict;
# Use the SOAP::Lite Perl module.
use SOAP::Lite;
# Take the query from the command line.
my $query = shift @ARGV or die "Usage: perl looply.pl <query>\n";
# Create a new SOAP::Lite instance, feeding it GoogleSearch.wsdl.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Keep track of result number.
my $number = 0;
for (my $offset = 0; $offset <= ($loops-1)*10; $offset += 10) {
# Query Google.
my $results = $google_search ->
doGoogleSearch(
$google_key, $query, $offset , 10, "false", "", "false",
"", "latin1", "latin1"
);
# No sense continuing unless there are more results.
last unless @{$results->{resultElements}};
# Loop through the results.
foreach my $result (@{$results->{'resultElements'}}) {
# Print out the main bits of each result.
print
join "\n",
++$number,
$result->{title} || "no title",
$result->{URL},
$result->{snippet} || 'no snippet',
"\n";
N otic e that the s c ript tells G oogle whic h s et of 1 0 res ults it's
after by pas s ing an offs et ($offset). T he offs et is inc reas ed by
1 0 eac h time ($offset += 10).
9.12.2. Running the Script
Run this s c ript from the c ommand line ["H ow to Run the H ac ks "
in the P refac e], pas s ing it your G oogle s earc h:
$ perl looply.pl "query keywords"
9.12.3. The Results
H ere's a s ample run. T he firs t attempt does n't s pec ify a query
and s o triggers a us age mes s age and does n't go any further. T he
s ec ond s earc hes for learning perl and prints out the res ults .
O utput is jus t the s ame as for the googly.pl s c ript in the prior
hac k, but now the number of res ults you net is limited only by
your s pec ified loop c ount (in this c as e 3 , netting 3 1 0 or 3 0
res ults ).
$ perl googly.pl
Usage: perl looply.pl <query>
% perl looply.pl "learning perl"
oreilly.com -- Online Catalog: Learning Perl , 3rd Edition
http://www.oreilly.com/catalog/lperl3/
... Learning Perl , 3rd Edition Making Easy Things
Easy and Hard Things Possible By Randal<br> L. Schwartz, Tom Phoen
3rd Edition July 2001 0-596-00132-0, Order Number ...
...
29
Intro to Perl for CGI
http://hotwired.lycos.com/webmonkey/98/47/index2a.html
... Some people feel that the benefits of learning
Perl scripting are few.<br> But ... part. That's right.
Learning Perl is just like being a cop. ...
30
WebDeveloper.com ®: Where Web Developers and Designers Learn How .
http://www.webdeveloper.com/reviews/book6.html
... Registration CreditCard Processing Compare Prices.
Learning Perl . Learning <br> Perl , 2nd Edition.
Publisher: O'Reilly Author: Randal Schwartz
...
9.12.4. Hacking the Hack
A lter the value as s igned to the $loops variable to c hange the
number of res ults . For ins tanc e, to loop 9 times and grab the top
9 0 res ults , c hange things like s o:
# Number of times to loop, retrieving 10 results at a time.
my $loops = 9; # 9 loops x 10 results per loop = top 90 results
Hack 96. Program Google in PHP
A simple example of programming the Google Web A PI with
PHP and the NuSOA P module.
P H P (http://www.php.net/), a rec urs ive ac ronym for P H P
H ypertext P roc es s ing, has s een wide us e as the H T M L-
embedded s c ripting language for web development. A dd to that
the N uSO A P P H P module for c reating and c ons uming SO A P -
bas ed web s ervic es (http://dietric h.ganx4 .c om/nus oap) and you
have a powerful c ombination.
T his hac k illus trates bas ic us e of P H P and N uSO A P in c onc ert
to interac t with the G oogle Web A P I .
9.13.1. The Code
Save the following c ode as a plain text file named googly.php
s omewhere on your web s ite where P H P is able to run. D on't
forget to replac e insert key here with your G oogle A P I key.
<!--
# googly.php
# A typical Google Web API php script.
# Usage: Point your browser at googly.php\
-->
<html>
<head>
<title>googly.php</title>
</head>
<body>
<h1>Googly</h1>
<form method="GET">
Query: <input name="query" value="<? print $HTTP_GET_VARS['query']
<input type="submit" name="Search">
</form>
<?
# Run the search only if you're provided a query to work with.
if ($HTTP_GET_VARS['query']) {
# Use the NuSOAP php library.
require_once('nusoap.php');
# Set parameters.
$parameters = array(
'key'=>'insert key here',
'q' => $HTTP_GET_VARS['query'],
'start' => 0,
'maxResults' => 10,
'filter' => false,
'restrict' => '',
'safeSearch' => false,
'lr' => '',
'ie' => 'latin',
'oe' => 'latin'
);
# Create a new SOAP client, feeding it GoogleSearch.wsdl on Goog
$soapclient = new soapclient("http://api.google.com/search/beta2
# Query Google.
$results = $soapclient->call('doGoogleSearch',$parameters, 'urn:
'urn:GoogleSearch');
# Results?
if ( is_array($results['resultElements']) ) {
print "<p>Your Google query for '" . $HTTP_GET_VARS['query'] .
. $results['estimatedTotalResultsCount'] . " results, the top
foreach ( $results['resultElements'] as $result ) {
print
"<p><a href='" . $result['URL'] . "'>" .
( $result['title'] ? $result['title'] : 'no title' ) .
"</a><br />" . $result['URL'] . "<br />" .
( $result['snippet'] ? $result['snippet'] : 'no snippet' )
"</p>";
# No results.
else {
print "Your Google query for '" . $HTTP_GET_VARS['query'] . "'
?>
</body>
</html>
9.13.2. Running the Hack
P oint your web brows er at your googly.php, fill in a query, and
c lic k the Searc h button. Figure 9 - 1 s hows the res ults of a
s earc h for php.
Figure 9-1. Google results by way of googly.php
9.13.3. See Also
A n alternate is the Servic es _G oogle pac kage
(http://pear.php.net/pac kage/Servic es _G oogle), a P H P 5
interfac e to the G oogle A P I .
Hack 97. Program Google in Java
Programming the Google Web A PI in Java is a snap, thanks to
the f unctionality packed into the Google Web A PI Developer's
Kit.
T hanks to the J ava A rc hive (J A R) file inc luded in the G oogle Web
A P I D eveloper's Kit, programming the G oogle A P I in J ava
c ouldn't be s impler. T he googleapi.j ar arc hive inc ludes
com.google.s oap.s earch, a nic e, c lean wrapper around the
underlying G oogle SO A P, along with the A pac he Software
Foundation's open s ourc e C rims on
(http://xml.apac he.org/c rims on) XM L pars er and A pac he SO A P
(http://xml.apac he.org/s oap/) s tac k, among others .
I n addition to the googleapi.j ar file inc luded
in the G oogle A P I D eveloper's Kit, you'll
need a c opy of the J ava 2 P latform,
Standard E dition (J 2 SE ,
http://java.s un.c om/downloads /) to
c ompile and run this hac k.
9.14.1. The Code
Save the following c ode to a file c alled Googly.j ava:
// Googly.java
// Bring in the Google SOAP wrapper.
import com.google.soap.search.*;
import java.io.*;
public class Googly {
// Your Google API developer's key.
private static String googleKey = "insert key here";
public static void main(String[] args) {
// Make sure there's a Google query on the command line.
if (args.length != 1) {
System.err.println("Usage: java [-classpath classpath] Googl
System.exit(1);
// Create a new GoogleSearch object.
GoogleSearch s = new GoogleSearch( );
try {
s.setKey(googleKey);
s.setQueryString(args[0]); // Google query from the command-
s.setMaxResults(10);
// Query Google.
GoogleSearchResult r = s.doSearch( );
// Gather the results.
GoogleSearchResultElement[] re = r.getResultElements( );
// Output.
for ( int i = 0; i < re.length; i++ ) {
System.out.println(re[i].getTitle( ));
System.out.println(re[i].getURL( ));
System.out.println(re[i].getSnippet( ) + "\n");
// Anything go wrong?
} catch (GoogleSearchFault f) {
System.out.println("GoogleSearchFault: " + f.toString( ));
Be s ure to drop in your own G oogle developer's key in plac e of
insert key here, like s o:
// Your Google API developer's key.
private static String googleKey = "
12BuCK13mY5h0E/34KN0cK@ttH3Do0R
";
9.14.2. Compiling the Code
To s uc c es s fully c ompile the G oogly applic ation, you'll need that
googleapi.j ar arc hive. I c hos e to keep it in the s ame direc tory as
my Googly.j ava s ourc e file; if you put it els ewhere, adjus t the
path after -classpath ac c ordingly.
% javac -classpath googleapi.jar Googly.java
T his s hould leave you with a brand new Googly.clas s file, ready to
run.
9.14.3. Running the Hack
Run G oogly on the c ommand line ["H ow to Run the H ac ks " in the
P refac e], pas s ing it your G oogle query, like s o under U nix and
M ac O S X:
% java -classpath .:googleapi.jar Googly "query words"
and like s o under Windows (notic e the ; ins tead of : in the
c las s path):
java -classpath .;googleapi.jar Googly "query words"
9.14.4. The Results
% java -classpath .:googleapi.jar Googly "Learning Java"
oreilly.com -- Online Catalog: Learning Java
http://www.oreilly.com/catalog/learnjava/
For programmers either just migrating to Java or already working
steadily in the forefront of Java development, Learning Java gives
a clear, systematic ...
oreilly.com -- Online Catalog: Learning Java , 2nd Edition
http://www.oreilly.com/catalog/learnjava2/
This new edition of Learning Java has been expanded and updated fo
Java 2 Standard Edition SDK 1.4. It comprehensively addresses ...
...
Java Programming...From the Grounds Up / Web Developer
http://www.webdeveloper.com/java/java_programming_grounds_up.html
... WebDeveloper.com. Java Programming... From the Grounds Up. by
Mark C. Reynolds ... Java Classes and Methods. Java utilizes the
basic object technology found in C++. ...
Hack 98. Program Google in Python
Programming the Google Web A PI with Python is simple and
clean, as these scripts and interactive examples demonstrate.
P rogramming to the G oogle Web A P I from P ython is a piec e of
c ake, thanks to M ark P ilgrim's P yG oogle wrapper module
(http://pygoogle.s ourc eforge.net/)n ow maintained by Brian
L anders . P yG oogle abs trac ts away muc h of the underlying
SO A P, XM L , and reques t/res pons e layers , leaving you free to
s pend your time with the data its elf.
9.15.1. PyGoogle Installation
D ownload a c opy of P yG oogle
(http://s ourc eforge.net/projec t/s howfiles .php? group_id=9 9 6 1 6 )
and follow the ins tallation ins truc tions
(http://pygoogle.s ourc eforge.net/dis t/readme.txt). A s s uming all
goes to plan, this s hould be nothing more c omplex than:
% python setup.py install
A lternatively, if you want to give this a whirl without ins talling
P yG oogle or don't have permis s ions to ins tall it globally on your
s ys tem, s imply put the inc luded SOAP.py and google.py files into
the s ame direc tory as the googly.py s c ript its elf.
9.15.2. The Code
Save this c ode to a text file c alled googly.py. Be s ure to replac e
insert key here with your own G oogle A P I key.
#!/usr/bin/python
# googly.py
# A typical Google Web API Python script using Mark Pilgrim's
# PyGoogle Google Web API wrapper
# [http://diveintomark.org/projects/pygoogle/].
# Usage: python googly.py <query>
import sys, string, codecs
# Use the PyGoogle module.
import google
# Grab the query from the command line
if sys.argv[1:]:
query = sys.argv[1]
else:
sys.exit('Usage: python googly.py <query>')
# Your Google API developer's key.
google.LICENSE_KEY = 'insert key here'
# Query Google.
data = google.doGoogleSearch(query)
# Teach standard output to deal with utf-8 encoding in the results
sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)
# Output.
for result in data.results:
print string.join( (result.title, result.URL, result.snippet), "
9.15.3. Running the Hack
I nvoke the s c ript on the c ommand line ["H ow to Run the H ac ks "
in the P refac e] as follows :
% python googly.py
"query words"
9.15.4. The Results
H ere's a s ample run, s earc hing for " learning python":
% python googly.py "learning python"
oreilly.com -- Online Catalog: <b>Learning</b>
<b>Python</b>
http://www.oreilly.com/catalog/lpython/
<b>Learning</b> <b>Python</b> is an
introduction to the increasingly popular interpreted programming
language that's portable, powerful, and remarkably easy to use in
<b>...</b>
...
Book Review: <b>Learning</b> <b>Python</b>
http://www2.linuxjournal.com/lj-issues/issue66/3541.html
<b>...</b> Issue 66: Book Review: <b>Learning</b>
<b>Python</b> <b>...</b> Enter
<b>Learning</b> <b>Python</b>. My executive summary
is that this is the right book for me and probably for many others
as well. <b>...</b>
9.15.5. Hacking the Hack
P ython has a marvelous interfac e for working interac tively with
the interpreter. I t's a good plac e to experiment with modules
s uc h as P yG oogle, querying the G oogle A P I on the fly and
digging through the data s truc tures it returns .
H ere's a s ample interac tive P yG oogle s es s ion demons trating
the us e of the doGoogleSearch, doGetCachedPage, and
doSpellingSuggestion func tions :
% python
Python 2.2 (#1, 07/14/02, 23:25:09)
[GCC Apple cpp-precomp 6.14] on darwin
Type "help", "copyright", "credits" or "license" for more informat
>>> import google
>>> google.LICENSE_KEY = '
insert key here
'
>>> data = google.doGoogleSearch("Learning Python")
>>> dir(data.meta)
['_ _doc_ _', '_ _init_ _', '_ _module_ _', 'directoryCatego
'documentFiltering', 'endIndex', 'estimateIsExact',
'estimatedTotalResultsCount', 'searchComments', 'searchQuery',
'searchTime', 'searchTips', 'startIndex']
>>> data.meta.estimatedTotalResultsCount
115000
>>> data.meta.directoryCategories
[{u'specialEncoding': '', u'fullViewableName': "Top/Business/Indus
Publishing/Publishers/Nonfiction/Business/O'Reilly_and_Associates/
Technical_Books/Python"}]
>>> dir(data.results[5])
['URL', '_ _doc_ _', '_ _init_ _', '_ _module_ _', 'cachedSi
'directoryCategory', 'directoryTitle', 'hostName',
'relatedInformationPresent', 'snippet', 'summary', 'title']
>>> data.results[0].title
'oreilly.com -- Online Catalog: <b>Learning</b> <b>Python'
>>> data.results[0].URL
'http://www.oreilly.com/catalog/lpython/'
>>> google.doGetCachedPage(data.results[0].URL)
'<meta http-equiv="Content-Type" content="text/html; charset=ISO-8
<BASE HREF="http://www.oreilly.com/catalog/lpython/"><table border
...
>>> google.doSpellingSuggestion('lurn piethon'
'learn python'
Hack 99. Program Google in C# and .NET
Create GUI and console Google search applications with C# and
the .NET f ramework.
T he G oogle Web A P I s D eveloper's Kit inc ludes a s ample C #
V is ual Studio .N E T (http://ms dn.mic ros oft.c om/vs tudio/) projec t
for a s imple G U I G oogle s earc h applic ation (take a look in the
dotnet/CSharp folder). T he func tional bits that you would probably
find mos t interes ting are in the Form1.cs c ode.
T his hac k provides bas ic c ode for a s imple c ons ole G oogle
s earc h applic ation s imilar in func tion (and, in the c as e of J ava
[Hack #97], form, too) to thos e in P erl Hack #98], et al.
C ompiling and running this hac k requires that you hav
Framework
(http://ms dn.mic ros oft.c om/netframework/downloads /
ins talled.
9.16.1. The Code
Type this c ode and s ave it to a text file c alled googly.cs :
// googly.cs
// A Google Web API C# console application.
// Usage: googly.exe <query>
// Copyright (c) 2002, Chris Sells.
// No warranties extended. Use at your own risk.
using System;
class Googly {
static void Main(string[] args) {
// Your Google API developer's key.
string googleKey = "insert key here";
// Take the query from the command line.
if( args.Length != 1 ) {
Console.WriteLine("Usage: google.exe <query>");
return;
string query = args[0];
// Create a Google SOAP client proxy, generated by:
// c:\> wsdl.exe http://api.google.com/GoogleSearch.wsdl
GoogleSearchService googleSearch = new GoogleSearchService( );
// Query Google.
GoogleSearchResult results = googleSearch.doGoogleSearch(googl
query, 0, 10, false, "", false, "", "latin1", "latin1");
// No results?
if( results.resultElements == null ) return;
// Loop through results.
foreach( ResultElement result in results.resultElements ) {
Console.WriteLine( );
Console.WriteLine(result.title);
Console.WriteLine(result.URL);
Console.WriteLine(result.snippet);
Console.WriteLine( );
}
Remember to ins ert your G oogle developer's key in plac e of
insert key here, like s o:
// Your Google API developer's key.
string googleKey = "
12BuCK13mY5h0E/34KN0cK@ttH3Do0R
";
9.16.2. Compiling the Code
Before c ompiling the C # c ode its elf, you mus t c reate a G oogle
SO A P c lient proxy. T he proxy is a wodge of c ode c us tom- built to
the s pec ific ations of the GoogleSearch.ws dl file, an XM L- bas ed
des c ription of the G oogle Web Servic e, all its methods ,
parameters , and return values . Fortunately, you don't have to do
this by hand; the .N E T Framework kit inc ludes an applic ation,
ws dl.exe, that does all the c oding for you.
T his is a remarkable bit of magic if you
think about it: the lion's s hare of
interfac ing to a web s ervic e auto-
generated from a des c ription thereof.
C all ws dl.exe with the loc ation of your GoogleSearch.ws dl file like
s o:
C:\GOOGLY.NET>wsdl.exe GoogleSearch.wsdl
I f you don't happen to have the WSD L file handy, don't fret. You
c an point ws dl.exe at its loc ation on G oogle's web s ite:
C:\GOOGLY.NET\CS>wsdl.exe http://api.google.com/GoogleSearch.wsdl
Microsoft (R) Web Services Description Language Utility
[Microsoft (R) .NET Framework, Version 1.0.3705.0]
Copyright (C) Microsoft Corporation 1998-2001. All rights reserved
Writing file 'C:\GOOGLY.NET\CS\GoogleSearchService.cs'.
T he end res ult is a GoogleSearchService.cs file that looks
s omething like this :
//----------------------------------------------------------------
// <autogenerated>
// This code was generated by a tool.
// Runtime Version: 1.0.3705.288
//
// Changes to this file may cause incorrect behavior and will
// the code is regenerated.
// </autogenerated>
//----------------------------------------------------------------
//
// This source code was auto-generated by wsdl, Version=1.0.3705.2
//
using System.Diagnostics;
using System.Xml.Serialization;
using System;
using System.Web.Services.Protocols;
using System.ComponentModel;
using System.Web.Services;
...
public System.IAsyncResult BegindoGoogleSearch(string key,
string q, int start, int maxResults, bool filter, string restr
bool safeSearch, string lr, string ie, string oe,
System.AsyncCallback callback, object asyncState) {
return this.BeginInvoke("doGoogleSearch", new object[] {
key,
q,
start,
maxResults,
filter,
restrict,
safeSearch,
lr,
ie,
oe}, callback, asyncState);
...
N ow on to googly.cs its elf:
C:\GOOGLY.NET\CS>csc /out:googly.exe *.cs
Microsoft (R) Visual C# .NET Compiler version 7.00.9466
for Microsoft (R) .NET Framework version 1.0.3705
Copyright (C) Microsoft Corporation 2001. All rights reserved.
9.16.3. Running the Hack
Run G oogly on the c ommand line ["H ow to Run the H ac ks " in the
P refac e], pas s ing it your G oogle query:
C:\GOOGLY.NET\CS>googly.exe "query words"
T he D O S c ommand window is n't the bes t
at dis playing and allowing s c rollbac k of
lots of output. To s end the res ults of your
G oogle query to a file for perus al in your
favorite text editor, append > results.txt.
9.16.4. The Results
H ere's a s ample run:
% googly.exe
"WSDL while you work"
Axis/Radio interop, actual and potential
http://www.intertwingly.net/stories/2002/02/08/
axisradioInteropActualAndPotential.html <b>...</b> But
<b>you</b> might find more exciting services here
<b>...</b> Instead, we should <b>work</b>
together and<br> continuously strive to <b>...</b>
<b>While</b> <b>WSDL</b> is certainly far from
perfect and has many <b>...</b>
...
Simplified <b>WSDL</b>
http://capescience.capeclear.com/articles/simplifiedWSDL/
<b>...</b> So how does it <b>work</b>?
<b>...</b> If <b>you</b> would like to edit
<b>WSDL</b> <b>while</b> still avoiding<br> all
those XML tags, check out the <b>WSDL</b> Editor in
CapeStudio. <b>...</b>
Chris Sells and Rael Dornfes t
Hack 100. Program Google in VB.NET
Create GUI and console Google search applications with Visual
Basic and the .NET f ramework.
A long with the func tionally identic al C # vers ion [Hack #99], the
G oogle Web A P I s D eveloper's Kit (dotnet/Vis ual Bas ic folder)
inc ludes a s ample G oogle s earc h in V is ual Bas ic . While you c an
probably glean jus t about all you need from the Google Demo
Form.vb c ode, this hac k provides bas ic c ode for a s imple c ons ole
G oogle s earc h applic ation without the pos s ible opac ity of a full-
blown V is ual Studio .N E T projec t.
C ompiling and running this hac k requires that you hav
Framework
(http://ms dn.mic ros oft.c om/netframework/downloads /
ins talled.
9.17.1. The Code
Save the following c ode to a text file c alled googly.vb:
' googly.vb
' A Google Web API VB.NET console application.
' Usage: googly.exe <query>
' Copyright (c) 2002, Chris Sells.
' No warranties extended. Use at your own risk.
Imports System
Module Googly
Sub Main(ByVal args As String( ))
' Your Google API developer's key.
Dim googleKey As String = "insert key here"
' Take the query from the command line.
If args.Length <> 1 Then
Console.WriteLine("Usage: google.exe <query>")
Return
End If
Dim query As String = args(0)
' Create a Google SOAP client proxy, generated by:
' c:\> wsdl.exe /l:vb http://api.google.com/GoogleSearch.wsdl
Dim googleSearch As GoogleSearchService = New GoogleSearchServ
' Query Google.
Dim results As GoogleSearchResult = googleSearch.
doGoogleSearch(googleKey, query, 0, 10, False, "", False, "", "lat
"latin1")
' No results?
If results.resultElements Is Nothing Then Return
' Loop through results.
Dim result As ResultElement
For Each result In results.resultElements
Console.WriteLine( )
Console.WriteLine(result.title)
Console.WriteLine(result.URL)
Console.WriteLine(result.snippet)
Console.WriteLine( )
Next
End Sub
End Module
You'll need to replac e insert key here with your G oogle A P I key.
Your c ode s hould look s omething like this :
' Your Google API developer's key.
Dim googleKey As String = "
12BuCK13mY5h0E/34KN0cK@ttH3Do0R
"
9.17.2. Compiling the Code
N ot s urpris ingly, c ompiling the c ode for the V B and .N E T
applic ation is very s imilar to c ompiling the c ode in C # and .N E T
[Hack #99] .
Before c ompiling the V B applic ation c ode its elf, you mus t c reate
a G oogle SO A P c lient proxy. T he proxy is a wodge of c ode
c us tom- built to the s pec ific ations of the GoogleSearch.ws dl file,
an XM L- bas ed des c ription of the G oogle Web Servic e, all its
methods , parameters , and return values . Fortunately, you don't
have to do this by hand; the .N E T Framework kit inc ludes an
applic ation, ws dl.exe, to do all the c oding for you.
C all ws dl.exe with the loc ation of your GoogleSearch.ws dl file and
s pec ify that you'd like V B proxy c ode:
C:\GOOGLY.NET\VB>wsdl.exe /l:vb GoogleSearch.wsdl
I f you don't happen to have the WSD L file handy, don't fret. You
c an point ws dl.exe at its loc ation on G oogle's web s ite:
C:\GOOGLY.NET\VB>wsdl.exe /l:vb http://api.google.com/GoogleSearch
Microsoft (R) Web Services Description Language Utility
[Microsoft (R) .NET Framework, Version 1.0.3705.0]
Copyright (C) Microsoft Corporation 1998-2001. All rights reserved
Writing file 'C:\GOOGLY.NET\VB\GoogleSearchService.vb'.
What you get is a GoogleSearchService.vb file with all that
underlying G oogle SO A P - handling ready to go:
'-----------------------------------------------------------------
' <autogenerated>
' This code was generated by a tool.
' Runtime Version: 1.0.3705.288
'
' Changes to this file may cause incorrect behavior and will b
' the code is regenerated.
' </autogenerated>
'-----------------------------------------------------------------
Option Strict Off
Option Explicit On
Imports System
Imports System.ComponentModel
Imports System.Diagnostics
Imports System.Web.Services
Imports System.Web.Services.Protocols
Imports System.Xml.Serialization
...
Public Function BegindoGoogleSearch(ByVal key As String, ByVal
String, ByVal start As Integer, ByVal maxResults As Integer, ByVal
filter As Boolean, ByVal restrict As String, ByVal safeSearch As
Boolean, ByVal lr As String, ByVal ie As String, ByVal oe As Strin
ByVal callback As System.AsyncCallback, ByVal asyncState As Object
System.IAsyncResult
Return Me.BeginInvoke("doGoogleSearch", New Object( ) {key
start, maxResults, filter, restrict, safeSearch, lr, ie, oe}, call
asyncState) End Function
'<remarks/>
Public Function EnddoGoogleSearch(ByVal asyncResult As System.
IAsyncResult) As GoogleSearchResult
Dim results( ) As Object = Me.EndInvoke(asyncResult)
Return CType(results(0),GoogleSearchResult)
End Function
End Class
...
N ow to c ompile that googly.vb:
C:\GOOGLY.NET\VB>vbc /out:googly.exe *.vb
Microsoft (R) Visual Basic .NET Compiler version 7.00.9466
for Microsoft (R) .NET Framework version 1.00.3705
Copyright (C) Microsoft Corporation 1987-2001. All rights reserved
9.17.3. Running the Hack
Run G oogly on the c ommand line ["H ow to Run the H ac ks " in the
P refac e], pas s ing it your G oogle query:
C:\GOOGLY.NET\VB>googly.exe "query words"
T he D O S c ommand window is n't the bes t
at dis playing and allowing s c rollbac k of
lots of output. To s end the res ults of your
G oogle query to a file for perus al in your
favorite text editor, append > results.txt
9.17.4. The Results
Func tionally identic al to its C # c ounterpart [Hack #99], the
V is ual Bas ic hac k s hould turn up about the s ame res ults G oogle
index willing.
Chris Sells and Rael Dornfes t
Colophon
O ur look is the res ult of reader c omments , our own
experimentation, and feedbac k from dis tribution c hannels .
D is tinc tive c overs c omplement our dis tinc tive approac h to
tec hnic al topic s , breathing pers onality and life into potentially
dry s ubjec ts .
T he tool on the c over of Google Hacks , Second Edition, is a pair of
loc king pliers . L oc king pliers are very vers atile tools . T hey c an
be us ed for turning, twis ting, c utting wire, tightening s c rews and
bolts , and c lamping. L oc king pliers are s pec ially des igned to put
pres s ure on a bolt or nut in s uc h a way that the us er c an
approac h the nut or bolt from any angle. A s imple s queeze c an
put up to a ton of pres s ure between the pliers ' jaws , enabling
them to loc k onto even oddly s haped piec es . L oc king pliers
inc lude a guarded releas e, whic h prevents ac c idental releas e or
pinc hing, and a trigger, whic h unloc ks the pliers .
A dam Witwer was the produc tion editor and c opyeditor for Google
Hacks , Second Edition. L eanne Soylemez was the proofreader.
M ary Brady and C laire C loutier provided quality c ontrol. Reg
A ubry wrote the index.
E die Freedman des igned the c over of this book. T he c over image
is an original photograph by E die Freedman. C lay Fernald
produc ed the c over layout with Q uarkXP res s 4 .1 us ing A dobe's
H elvetic a N eue and I T C G aramond fonts .
M elanie Wang des igned the interior layout bas ed on a template
by D avid Futato. T his book was c onverted by J ulie H awks to
FrameM aker 5 .5 .6 with a format c onvers ion tool c reated by E rik
Ray, J as on M c I ntos h, N eil Walls , and M ike Sierra that us es P erl
and XM L tec hnologies . T he text font is L inotype Birka; the
heading font is A dobe H elvetic a N eue C ondens ed; and the c ode
font is L uc as Font's T heSans M ono C ondens ed. T he illus trations
that appear in the book were produc ed by Robert Romano and
J es s amyn Read us ing M ac romedia FreeH and M X and A dobe
P hotos hop C S. T his c olophon was written by L inley D olby.
T he online edition of this book was c reated by the Safari
produc tion group (J ohn C hodac ki, E llie C utler, and Ken
D ouglas s ) us ing a s et of Frame- to- XM L c onvers ion and c leanup
tools written and maintained by E rik Ray, Benn Salter, J ohn
C hodac ki, E llie C utler, and J eff L iggett.
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
10-w ord limit on search terms
26 Steps to 15K a Day (PageRank guidelines)
Index
[SYMBOL] [A] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
ad groups
add-ons
AddressBookToCSV w eb site
AdSense
Amazon Associates
Amazon Web Services (AWS)
Amazon/Google Ad Replacement (AGAR)
backup ads
public service announcements
advanced operators [See special
syntaxes]
Advanced Search
AdWords
ad groups
click-through rates (CTRs) 2nd
copyw riting
determining value of
editorial policies
exporting to comma-separated (CSV) file
generating
Glossarist w eb site
keyphrase search frequency reports
scraping
Select system
Status of the ActiveState PPM Repositories
Traffic Estimator
Wordtracker
aggregate-related statements (Googlism)
airplane registration numbers
airport information
Algorithm::Permute Perl module w eb site
Allw ine, Tim
Amazon Hacks
Amazon/Google Ad Replacement (AGAR)
Anatomy of a Large-Scale Hypertextual Web Search Engine, The
annual reports, embellishing w ith images
antisocial syntaxes
API [See Google Web API]
area codes
articles, using copyright to search
Artymiak, Jacek
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
bad neighborhoods and w ebmastering
Bausch, Paul
Belt Buckle, Search Engine
Blackberry mobile email device
Blanton, Justin 2nd 3rd
w eb site
blog*spot w eb site
Blogger w eb site
Blosxom w eblog
bookmarklets
Bookmarklets for Opera w eb site
Boolean
AND
default
OR
searches
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
cached pages, removing from Google
calculator via SMS
CapeMail, search results via
cartography application
case sensitivity
category results
cell phone, Google via
Centuryshare Calculator w eb site
Ciampi, Tanya Harvey
cleaning up for a Google visit
click-through rates (CTRs)
cloaking w eb sites
comma-separated output
command line Google Calculator application
company information, tracking through stock symbols
comparing results w ith other search engines
computer bug, term coined
contacts, importing into Gmail
content creation date
cookies
copyright, using to search for articles
copyw riting Google AdWords
CSV files
curl library w eb site
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
date range
Centuryshare Calculator and
content creation and
custom search form application
daterange\: syntax
finding items added yesterday to the index
Julian dates 2nd
searching
dates, search results by
Dave's Quick Search Deskbar w eb site
definition feature
definitions
via SMS
dictionary
built-in
Dictionary.com w eb site
Directi w eb site
directories
directory search
DMOZ w eb site
Dogpile w eb site
domain results, advanced
domain search
domains, summarizing results by type
doorw ay pages
Dooyoo Bookmarklets w eb site
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
email
Blackberry mobile device
Google search results by
excluding special characters
explicit inclusion of search terms 2nd
exporting Gmail messages
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
FaganFinder Google interface w eb site
FCC equipment ID numbers
Feeling Large
Feeling Lucky, advanced
file format options
filetype: syntax
filtering 2nd
Google Images and
FindForw ard w eb site
finding out w hat Google thinks of a topic
Firefox w eb site
FishHoo w eb site
Froogle
Froogle via SMS
FUSE filesystem infrastructure
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
G-Metrics w eb site
G4j Java interface for Gmail
GAPS (Google API Proximity Search)
w eb site
GARBO w eb site
gatew ay pages
GAWSH w eb site
Geeklog w eblog
geotargeting
GetMail utility
GGSearch w eb site
Gmail
account invitations
as a filesystem
as a Window s drive
CSV files and importing contacts
documentation w eb site
exporting addresses from Hotmail
exporting addresses from Mac OS X Address Book
exporting addresses from Outlook and Outlook Express
exporting addresses from Yahoo! Address Book
exporting messages
FUSE filesystem infrastructure
G4j Java interface
GetMail utility
Getting More Out of Gmail
Gmail 4 Troops w eb site
Gmail Agent API .NET interface
GMail API for Java interface
Gmail Drive Shell Extension
Gmail for the Troops w eb site
Gmail Gems
Gmail-lite application
gmail.py Python interface
GmailForums
GmailFS
Gmcp backup utility
importing contacts
importing mail
in a text-only Lynx brow ser
libgmail
mailbox formats
Babyl
MailDir
Mbox
MH
Microsoft Outlook
MMDF
Mark Lyon apps and hacks
mobile
on a Pocket PC
Perl libraries
PHP backup utility
plus-addressing for custom email addresses
programming the interface
w ith .NET
w ith Java
w ith Perl
w ith PHP
w ith Python
sw ap w eb site
syntax
YPOPs utility
Gmail-lite application
Gmail-mobile application
gmail.py
GmailerXP
w eb site
GmailFS
installing
mounting
structures
Goodman, Andrew
Google AdSense [See AdSense]
Google Advanced Search
Google AdWords [See AdWords]
Google Alerts service 2nd
Google Answ ers
Google Blog w eblog
Google box
refreshing
Google by AOL Instant Messenger
Google by email
Google Calculator
running on the command line
Google Cartography application
Google Compute w eb site
Google Deskbar w eb site
Google Desktop Proxy w eb site
Google Desktop Search
installing
privacy and
setting preferences
syntax
Google Directory 2nd
Google from IRC
Google from Microsoft Word
Google Groups
preventing material from being archived
scraping
special syntaxes
versus Google Web Search
Google Groups 2
Google Images
Advanced Image Search form
filtering
finding corporate and product logos
having images removed
syntaxes
Google Jump w eb site
Google Labs
Froogle Wireless
Google Compute
Google Deskbar
Google Desktop Search
Google Groups 2
Google Personalized Web Search
Google Sets
Google SMS
Google WebQuotes
Site-Flavored Google Search Box
Google Labs w eb site
Google Local
map navigation
w eb site
Google Mail Loader application
Google mindshare
Google mobile applications
Google Neighborhood application
Google New s
Google PageRank extension
Google PDA Search w eb site
Google Phonebook
removing your listing
Google Sets w eb site
Google Smackdow n application
w eb site
Google SMS
Google Toolbar
Google Translate w eb site
Google via cell phone
Google via PDA or Smartphone
Google WAP w eb site
Google Web API
10 results per query limit
ActivePerl
C# and .NET, programming in
Developer's Kit
GoogleJack!
Java, programming in
key, using
looping past 10 results limit
Net::Google Perl module
Perl, programming in
PHP, programming in
Programmer's Package Manager
Python, programming in
queries, understanding
responses, understanding
Services_Google package
signing up
SOAP::Lite Perl module, programming in
spidering and scraping
Terms and Conditions
VB .NET, programming in
Google Web APIs Developer's Kit w eb site
Google Weblog w eblog
Google WebQuotes w eb site
Google Zeitgeist
w eb site
Google: The Missing Manual
GoogleAlert service
GoogleBot (Google spider) 2nd
GoogleJack w eb site
googlematic
Googlism
w eb site
googol
Gophoria w eb site
graph Google results over time
Greymatter w eblog
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
Hammersley, Ben
Hemenw ay, Kevin 2nd 3rd
hidden variables
date ranges
file type
in custom search forms
number of results
site search
URLs
Hoovers w eb site
Hopper, Grace
Horrell, Mark
Hotmail, exporting addresses to Gmail
HTML::LinkExtor w eb site
Hw ang, John vey
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
I'm Feeling Lucky button
idioms
Iff, Morbus
image search
importing mail into Gmail
inappropriate content, reporting
index
items added yesterday application
instant messenger, Google results via
integrating results into a w eb page
interface language 2nd
Internet Society (ISOC)
intitle: syntax
inurl: syntax
IRC, Google results via
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
Java w eb site
Java, programming Gmail interface w ith
Joe Maller's Translation Bookmarkets w eb site
John Battelle's Searchblog w eblog
Johnson, Leland
Johnson, Steven
w eb site
Jones, Richard
JRE w eb site
Julian dates 2nd
FaganFinder w eb site
Julian::Day module w eb site
Jung graph package w eb site
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
keyphrase search frequency reports
keywords
repetition in queries
w eighting
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ]
[W] [X] [Y ] [Z]
language
machine translation
search language
specific for country
tools
language options for search results
libgmail 2nd 3rd
limits,101K page size
link count
link: syntax 2nd
LiveJournal w eblog
local business listing via SMS
local information 2nd
LuckyMarklets w eb site
Lynx brow ser, running Gmail in
Lyon, Mark 2nd
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M ] [N] [O] [P] [Q] [R] [S] [T] [U] [V ]
[W] [X] [Y ] [Z]
Mac OS X Address Book, exporting addresses to Gmail
MakeAShorterLink w eb site
Manila w eblog
MapQuest maps
maps
Mastering Regular Expressions
META tags
metadata
Microsoft Word, Google results via
Milly's Bookmarklets w eb site
Milstein, Sarah
minus sign
misspellings, usefulness of
mobile applications for Google
mobile Gmail
mobile texting tips
Movable Type w eb site
Mozilla Googlebar
multiple iterations of search terms
Mutton, Paul
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
NCSA SSI Tutorial w eb site
negating a search term 2nd
.NET w eb site
.NET, programming Gmail interface w ith
new s feature
New smap w eb site
number of results per page, setting
number range operator
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ]
[W] [X] [Y ] [Z]
package tracking numbers
page summary
PageRank 2nd
algorithm
Calculator
extension for Mozilla
PageRank abusers
parallel texts
patent numbers
PDA, Google via
PDF files
Perl and LWP
Perl w eb site
Perl, programming Gmail interface w ith
permuting search terms
Phoenix, Tom
phonebook
caveats
finding institution phonebooks
removing your listing
reverse number lookup
syntaxes
via SMS
PHP w eb site
PHP, programming Gmail interface w ith
PHP-based applications
phrase searches
Pilgrim, Mark
w eb site
PircBot Java IRC API w eb site
Pitas w eblog
plug-ins [See add-ons]
plus sign
plus-addressing
pMachine w eblog
Pocket PC, running Gmail on
popularity contest application
Prefuse graph visualization framew ork w eb site
PRGooglebar
product pricing via SMS
product search w ith Froogle
proper names
properties
prototyping w ith Python
proximity searches
PST Reader
public service announcements and AdSense
Python
as a language for rapid prototyping
Mega Widgets toolkit
programming Gmail interface w ith
w eb site
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
query w ord combinations
query words [See search terms]
query, permuting
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
Radial Layout algorithm
Radio Userland w eb site
Random Personal Picture Finder w eb site
random results
Random Yahoo! Link w eb site
ranking algorithm
recipes, using Google
removing material from Google
repeating search terms in queries
residential phone numbers via SMS
results
category
comparing w ith other search engines
excluding w eblogs
Google box
interpreting
limiting to a specified depth
metadata
page summary
random
restricting to top-level
returning comma-separated output
setting for researchers
setting number per page
summarizing by types of domains
tw eaking w ith URLs
visually displayed
results, search
visual display (TouchGraph)
results, tracking counts over time
reverse phone number lookup
robots.txt file
Rocketinfo w eb site
RSS/Atom feed reader w eb site
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
SafeSearch filtering 2nd 3rd
Savikas, Andrew
Scattersearch application
Schw artz, Randal L.
scraping
Google AdWords
Google Groups
Yahoo Buzz application
screen scraping
search
comparing results over fifty-year spans
search engine basics
Search Engine Belt Buckle
search engine optimization [See
webmastering and Google]
search form application
search forms
creating your ow n
Search Grid w eb site
search language
search results by date
search terms
10-w ord limit
combinations
favoring obscure keyw ords
location in document
multiple iterations of
permuting
popularity comparsion application
repetition in queries
special characters
w eighting
searching for specific file formats
SearchSpy w eb site
Sells, Chris 2nd
Shapiro, Alex
Shay, Kevin
Shorl w eb site
Short Message Service (SMS)
shortcodes
Sifry, David
Sinclair ZX-80 personal computer w eb site
site: syntax
slang 2nd
custom form for searching
Dictionary of Slang w eb site
Glossarist w eb site
industry
Law.com w eb site
MedTerms.com w eb site
Netlingo w eb site
On-Line Medical Dictionary w eb site
Probert Encyclopedia w eb site
strategies
Surfing for Slang w eb site
Tech Encyclopedia w eb site
Webopedia w eb site
Whatis w eb site
Smackdow n
smartphones, Google via
SMS
Calculation calculator
charges for googling
Definition dictionary
Froogle Prices
Google Local Business Listing
Google via
how to use w eb site
Residential Phone Number
Web Search
Zip Code
snapshot of Google top queries over time
SOAP::Lite w eb site
Soople w eb site
special characters
special syntaxes
allintext\:
allintitle\:
allinurl\:
antisocial syntax elements
bphonebook\:
cache\:
daterange\:
filetype\:
Gmail
Google Groups
Google Images
inanchor\:
info\:
intext\:
intitle\:
inurl\:
link:
link\: 2nd
mixing
phonebook\: 2nd
related\:
rphonebook\:
site\: 2nd
stocks\:
spellchecker 2nd 3rd
spidering 2nd
Spidering Hacks
stemming
stock quotes
stock symbols, for tracking company information
stop w ords
street map application
street maps
studs
subject indexes
Sullivan, Danny
synonym operator w ith search terms
synonyms
scraping list of
syntaxes
link:
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ]
[W] [X] [Y ] [Z]
Tabke, Brett 2nd
Technorati web site
web sites
Technorati
texting on a mobile device
Thunderbird mail application w eb site
tilde character
TinyURL w eb site
toolbars
Google Toolbar [See Google
Toolbar]
topic searches
Torrone, Phillip M.
TouchGraph Google Brow ser
w eb site
Trachtenberg, Adam
w eb site
TrackBack w eb site
tracking package numbers
translating
machine translation
translation tools
trends, tracking w ith geotargeting
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
uJournal w eblog
UPC codes
URL-shortening services
URLs
generating from the Open Directory Project (ODP)
tw eaking results
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
Vehicle ID numbers (VIN)
visual display of Google results
visual display of search results (TouchGraph)
vocabularies, specialized
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
WAP/WML
web hacks
advanced
Web Robots Database w eb site
w eb search via SMS
Web Services Description Language (WSDL) 2nd
web sites
about cloaking
about doorw ay pages
ActiveState
Adam Trachtenberg
AddressBookToCSV
Amazon Associates
Amazon Web Services (AWS)
Amazon/Google Ad Replacement (AGAR)
Apache SOAP
Blackberry
Bookmarklets for Opera
C# Visual Studio .NET
Cape Clear
content, removing from index
Crimson XML parser
curl library
Dave's Quick Search Deskbar
Directi
Dogpile
Dooyoo Bookmarklets
Expat
FaganFinder Google interface
Firefox
FishHoo!
FUSE filesystem infrastructure
GAPS
GARBO
GAWSH
GetMail utility
GGSearch
Glossarist, The
Gmail 4 Troops
Gmail documentation
Gmail Drive Shell Extension
Gmail for the Troops
Gmail sw ap
Gmail-lite application
gmail-mobile
application
project page
gmail.py
GmailerXP
GmailFS (Gmail filesystem)
Gmcp backup utility
Google AdSense
Google AdWords
Google Alerts
Google Answ ers
Google Blog
Google Compute
Google Deskbar
Google Desktop Proxy
Google Desktop Search
Google Jump
Google Labs
Google Mail Loader
Google PageRank extension
Google PDA Search
Google Personalized Web Search
Google Sets
Google Smackdow n
Google Smackdow n application
Google Translate!
Google WAP
Google Web API
Google Web API Developer's Kit
Google Weblog
Google w ebmaster info page
Google WebQuotes
Google Zeitgeist
GoogleAlert
Googlebar
GoogleJack!
Googlematic
Googlism
Gophoria
Hoovers
how to use Google SMS
HTML::LinkExtor
HTML::TokeParser Perl module
Java 2 Platform, Standard Edition (J2SE)
Joe Maller's Translation Bookmarkets
John Battelle's Searchblog
John vey Hw ang
JRE
Jung graph package
Justin Blanton
LuckyMarklets
Mac OS X Safari
MakeAShorterLink
Mark Pilgrim
Microsoft Corporation
Milly's Bookmarklets
Mozilla
Net::Google Perl module
Netscape
New smap
NuSOAP PHP module
Onfocus
Opera
PageRank Calculator
Perl
phonebook listing, removing
PHP
backup utility
PircBot Java IRC API
Prefuse graph visualization framew ork
PRGooglebar
PST Reader
PyGoogle w rapper module
Python
FUSE bindings
Mega Widgets toolkit
Rael Dornfest
Random Personal Picture Finder
Rocketinfo
RSS/Atom
SearchSpy
Services_Google package
Shorl
Sinclair ZX-80 personal computer
Site-Flavored Google Search Box
SOAP::Lite
Soople
SSL support
Thunderbird mail application
TinyURL
TouchGraph Google Brow ser
TrackBack
Web Robots Database
WebCollage
Webmaster World
Window s Media Encoder 9
w ireless Froogle
Wordtracker
XScreenSaver
Yahoo Buzz
Yahoo! Daily New s
YPOPs utility
Webb, Matt
WebCollage w eb site
weblogs
blog*spot w eb site
Blogger w eb site
Blosxom
excluding from results
finding commentary on
Geeklog
Greymatter
LiveJournal
Manila
Movable Type w eb site
Pitas
pMachine
Radio Userland w eb site
uJournal
WordPress
Webmaster World w eb site
webmastering and Google
26 Steps to 15K a Day (PageRank guidelines)
bad neighborhoods
being a good search engine citizen
cleaning up for a Google visit
hidden text
META tags
PageRank
PageRank abusers
PageRank Calculator
removing material from Google
search engine basics
search engine optimization template
submitting sites to Google
w ebmaster Info pages
w eighting query keyw ords
wildcards
full-w ord
hacking the 10-w ord limit w ith
Window s Media Encoder 9 w eb site
w ireless Froogle w eb site
WordPress w eblog
Wordtracker w eb site
WSDL [See Web Services
Description Language]
WWW Search Interfaces for Translators w eb site
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
XML format for describing w eb services
XML parser library (Expat)
XScreenSaver w eb site
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y] [Z]
Yahoo Address Book, exporting addresses to Gmail
Yahoo Buzz Index w eb site
Yahoo Buzz w eb site
Yahoo Daily New s w eb site
Yahoo Directory Mindshare in Google application
Yahoo Finance w eb site
Yahoo maps
Yahoo! Local
Index
[SYMBOL] [A ] [B] [C] [D] [E] [F] [G] [H] [I] [J]
[K] [L] [M] [N] [O] [P] [Q] [R] [S] [T] [U] [V ] [W]
[X] [Y ] [Z]
Zaw inski, Jamie 2nd
Zip Code via SMS