Introduction to Webometrics Mike Thelwall @mikethelwall Professor of Information Science, Statistical Cybermetrics Research Gp, University of Wolverhampton
Reminder of pre-workshop task Delegates were asked to join YouTube and leave comments and replies to earlier comments on the video: Department of Library and Information Science, Delhi http://www.youtube.com/watch?v=_-OAsF9uRfc These contributions will form part of the discussion at the end of the session, and include reference to the self-declared age and gender information from YouTube.
Overview of “Webometrics” What is Webometrics?  Gathering, processing and analysing large scale data from the web  (web pages, hyperlinks, blogs, Web 2.0) for many purposes that include online communication What can Webometrics offer other researchers? Software  to gather data from web sites, search engines, social network sites and blogs;  methods  to extract useful patterns  Common data sources Webometric Analyst  software: Twitter, YouTube, the Web, Technorati, Bing Bespoke: Any resource with an API, page scraping of other sites,  SocSciBot  web crawler http://lexiurl.wlv.ac.uk
1. Background: Webometrics Webometrics is about gathering data on the Web, and measuring aspects of the Web: web sites web pages hyperlinks YouTube video commenter networks web search engine results MySpace Friend networks Twitter or blog trends … for varied social science purposes
New problems: Web-based phenomena   Webometrics can analyse online academic communication Why do academic web sites interlink? Which academic web sites interlink? What academic interlinking patterns exist? Which web sites/groups/documents have the most online impact, and why?
Old problems:  Offline phenomena reflected online   Some offline phenomena have measurable online reflections International communication Inter-university collaboration University-business collaboration The impact or spread of ideas Public opinion about science
Example: The online impact of research groups (NetReAct)
Normalised linking, smallest countries removed Geopolitical connected Sweden Finland Norway UK Germany Austria Switzerland Poland Italy Belgium Spain France NL Example: Links between EU universities
International biofuels research network
Data Gathering/Processing Tools Webometric Analyst – web citations, web text, YouTube, Flickr, Technorati Submits thousands of queries to Bing and summarises the results in standard ways SocSciBot – links, web text Web Crawler & analyser
2. altmetrics in traditional research evaluation  Altmetrics can supplement traditional citation impact with non-traditional online impact  E.g., educational, discussion-based Often weaker than citation data but useful for research groups that have non-standard types of impacts
The Integrated Online Impact Indicator (IOI) Combines a range of online sources into one indicator Google Scholar  + Google Books  + Course reading lists  + Google Blogs  + PowerPoint presentations  = IOI OR  select individual separate components Invented by Kayvan Kousha
New source 1: Google Scholar Wider evidence of academic impact Wider types of academic publications, some non-academic publications Not reliable Coverage variable Can’t be automatically queried Free
New source 2: Google Books Books typically not indexed in WoS or Scopus Relevant in book-based disciplines (arts, humanities, some social sciences) Reliability unknown but probably not good Coverage variable Can  be automatically queried Free  [ Clifford Lynch ]
New source 3: Course reading lists Evidence of educational impact Can automatically construct queries to detect individual articles in online syllabuses Get results via advanced Google/Yahoo/Live Search queries Works for most articles Fails for short common article titles
New source 4: Blogs Evidence of impact on discussions Educational impact, public dissemination evidence, academic impact in discursive subjects? Not possible to automate in the largest database (Google Blogs)? Not a well researched area
New source 5: PowerPoint Presentations Evidence of educational/scholarly impact Especially relevant for discursive subjects? Automated Live Search/Yahoo advanced queries IOI = a*Scholar + b*PowerPoint + c*Blogs + d* Syllabus + e* Books Or use qualitative analyses of the different sources
2. Sentiment Strength Detection in the Social Web with  SentiStrength Detect positive and negative sentiment  strength  in short informal text Develop workarounds for lack of standard grammar and spelling Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!) Classify simultaneously as positive 1-5 AND negative 1-5 sentiment Thelwall, M., Buckley, K., & Paltoglou, G. (in press).  Sentiment strength detection for the social Web . Journal of the American Society for Information Science and Technology . Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010).  Sentiment strength detection in short informal text . Journal of the American Society for Information Science and Technology , 61(12), 2544-2558.
SentiStrength Algorithm - Core List of 2,489 positive and negative sentiment term stems and strengths (1 to 5), e.g. ache = -2, dislike = -3, hate=-4, excruciating -5 encourage = 2, coolest = 3, lover = 4 Sentiment strength is highest in sentence; or highest sentence if multiple sentences
My legs ache. You are the coolest. I hate Paul but encourage him. -2 3 -4 2 1, -2 positive, negative 3, -1 2, -4
Extra sentiment methods spelling correction   nicce -> nice booster words  alter strength very  happy negating words  flip emotions   not  nice repeated letters  boost sentiment/+ve   niiiice  emoticon list   :) =+2 exclamation marks  count as +2 unless –ve  hi! repeated punctuation  boosts sentiment   good!!! negative emotion ignored in questions   u h8 me? Sentiment idiom list    shock horror = -2 Online as  http://sentistrength.wlv.ac.uk/
Tests against human coders SentiStrength agrees with humans as much as they agree with each other 1 is perfect agreement, 0 is random agreement Data set Positive scores -correlation with humans Negative scores -correlation with humans YouTube 0.589 0.521 MySpace 0.647 0.599 Twitter 0.541 0.499 Sports forum 0.567 0.541 Digg.com news 0.352 0.552 BBC forums 0.296 0.591 All 6 data sets 0.556 0.565
Why the bad results for BBC? (and Digg) Irony, sarcasm and expressive language e.g., David Cameron must be  very happy  that I have lost my job. It is  really interesting  that David Cameron and most of his ministers are millionaires. Your argument is a  joke . $
2. Twitter – sentiment in major media events Analysis of a corpus of 1 month of English Twitter posts (35 Million, from 2.7M accounts) Automatic detection of spikes (events) Assessment of whether sentiment changes during major media events
Automatically-identified Twitter spikes 9 Mar 2010 9 Feb 2010 Proportion of tweets mentioning keyword Thelwall, M., Buckley, K., & Paltoglou, G. (2011).  Sentiment in Twitter events .  Journal of the American Society for Information Science and Technology,  62(2), 406-418.
Chile matching posts Sentiment strength Subj. Increase in –ve sentiment strength 9 Feb 2010 9 Feb 2010 Date and time Date and time 9 Mar 2010 9 Mar 2010 Av. +ve sentiment Just subj. Av. -ve sentiment Just subj. Proportion of tweets mentioning Chile
#oscars % matching posts Sentiment strength Subj. Increase in  –ve  sentiment strength Date and time Date and time 9 Feb 2010 9 Feb 2010 9 Mar 2010 9 Mar 2010 Av. +ve sentiment Just subj. Av. -ve sentiment Just subj. Proportion of tweets mentioning the Oscars
Sentiment and spikes Statistical analysis of top 30 events: Strong evidence that  higher volume hours have stronger negative sentiment than lower volume hours No evidence  that higher volume hours have different  positive  sentiment strength than lower volume hours => Spikes are typified by  small  increases in  negativity
3. YouTube Video  comments 1000 comm. per video via  Webometric Analyst  (or the YouTube API) Good source of social web text data Analysis of all comments on a pseudo-random sample of 35,347 videos with < 1000 comments
 
Using Webometric Analyst Download free from  http://lexiurl.wlv.ac.uk Start, select classic interface, YouTube Tab
Reply networks Illustrate the replies to a YouTube video in network form Reveal age and gender of posters Reveal patterns of discussion in the replies (if any) Take up to 25 minutes to make per video with Webometric Analyst
Reply network Extended core interactions 2x2=5 video Nodes (people) blue = male pink = female Arrows (replies) red = happy replies black = angry replies
The 10 most ridiculous Black Metal videos A very sparse reply network Nodes are mostly connected in 2s and 3s
Black Metal vs. Deathcore Denser reply network On-going debates and contentious issues
Other networks Connections can be Friendship in YouTube (must be reciprocal) Subscription in YouTube (non-reciprocal, based upon interest in video content) Friends in common in YouTube  suggests factors in common (e.g., bands) rather than people in common Subscriptions in common in YouTube  again suggests factors in common (e.g., bands) rather than people in common
Very sparse Friend network = common
Common Friends network – with a densely connected core
Large-scale analysis of YouTube Purpose: to discover patterns, norms and unusual behaviour in YouTube Method: Generate a large sample of YouTube videos Running searches for many terms from a large word list Selecting a video at random from each set of results Extract properties of the videos and commenters Calculate averages and distributions Examine extreme videos
Comments are mainly positive
 
Controversial and  non-controversial ?
Conclusions Sentiment analysis allows large-scale social web analysis YouTube videos can be analysed easily with a network perspective on their comments http://lexiurl.wlv.ac.uk/  -  Webometric Analyst Can make a reply network of the discussions of video _-OAsF9uRfc   by following the instructions at  http://lexiurl.wlv.ac.uk/searcher/youtubereplies.html
 

More Related Content

PPTX
Webometrics
PPTX
Bibliographic control : Basics
PPT
Universal Bibliographic Control and Universal Availability of Publications (U...
PPTX
Digital library software
PPT
PPTX
INFORMATION SOURCES AND SERVICES
PPTX
Indest
PPTX
Library Services presentation
Webometrics
Bibliographic control : Basics
Universal Bibliographic Control and Universal Availability of Publications (U...
Digital library software
INFORMATION SOURCES AND SERVICES
Indest
Library Services presentation

What's hot (20)

PPTX
ASTINFO & APINESS
PDF
Networking Systems in Libraries
PPTX
Serial control
DOCX
Digital Reference Service in Library
PPT
Presentation federated search
PPT
Subject cataloguing
PPTX
PPTX
Gatways And Portal
PPTX
Bibliography Services.pptx
PPTX
PPTX
Artificial Intelligence role in Libraries
DOCX
IIM Consortia
PPTX
Library 2.0
PPT
N-LIST program of INFLIBNET
PPT
How to access the e resources (2)
PPT
Staff manual,lib.survey,statistics,standards.
DOCX
THE DEVELOPMENT AND ROLES OF PUBLIC LIBRARIES
PPTX
E-granthalaya ILMS
PPTX
International System for Agricultural Science and Technology (AGRIS) by Gaura...
ASTINFO & APINESS
Networking Systems in Libraries
Serial control
Digital Reference Service in Library
Presentation federated search
Subject cataloguing
Gatways And Portal
Bibliography Services.pptx
Artificial Intelligence role in Libraries
IIM Consortia
Library 2.0
N-LIST program of INFLIBNET
How to access the e resources (2)
Staff manual,lib.survey,statistics,standards.
THE DEVELOPMENT AND ROLES OF PUBLIC LIBRARIES
E-granthalaya ILMS
International System for Agricultural Science and Technology (AGRIS) by Gaura...
Ad

Viewers also liked (16)

PPT
WEBOMETRICS 2010
PPT
Webometrics report
PPT
Webometrics
PPT
Webometrics - the evolution of a digital social science research field
PPT
Scribd
PDF
Essential guide to understanding SEO and Webometrics - and quick tips and act...
PPTX
What is bibliometrics and how does it work?
ZIP
Bibliometrics and scientometrics
PPTX
Bibliometrics, Scintometrics, Citation analysis, Content analysis
PPT
Introduction To SPSS
PPTX
Statistical Package for Social Science (SPSS)
PPT
Spss lecture notes
PPTX
Improving University Rankings through Google Scholar Profiles
PPT
Basic guide to SPSS
PPT
Introduction to spss
WEBOMETRICS 2010
Webometrics report
Webometrics
Webometrics - the evolution of a digital social science research field
Scribd
Essential guide to understanding SEO and Webometrics - and quick tips and act...
What is bibliometrics and how does it work?
Bibliometrics and scientometrics
Bibliometrics, Scintometrics, Citation analysis, Content analysis
Introduction To SPSS
Statistical Package for Social Science (SPSS)
Spss lecture notes
Improving University Rankings through Google Scholar Profiles
Basic guide to SPSS
Introduction to spss
Ad

Similar to Mike Thelwall: Introduction to Webometrics (20)

PPT
Media 330057 smxx
PPTX
Digging for data: opportunities and challenges in an open research landscape_...
PDF
Conducting Twitter Reserch
PPTX
Stepping out of the echo chamber - Alternative indicators of scholarly commun...
PDF
Web Tools for Election Administration
PPTX
Predicting Discussions on the Social Semantic Web
PDF
Acm tist-v3 n4-tist-2010-11-0317
PPT
Twitter analytics
PPT
Murpha11
PDF
AI in between online and offline discourse - and what has ChatGPT to do with ...
PPTX
The Right Metrics for Generation Open [Open Access Week 2014]
PPTX
Echo Chamber? What Echo Chamber? Reviewing the Evidence
PPT
Surveying Our Landscape From Top to Bottom
PPT
Building and Communicating Evidence of Effectiveness in OER through Collectiv...
PPTX
Insights From Social Media
PDF
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
PPTX
Frontiers of Computational Journalism week 3 - Information Filter Design
DOCX
NLP journal paper
PPT
Twitter analytics -digiworldhanoi.vn
PPT
Conversation Marketing: New Media Communication Strategy
Media 330057 smxx
Digging for data: opportunities and challenges in an open research landscape_...
Conducting Twitter Reserch
Stepping out of the echo chamber - Alternative indicators of scholarly commun...
Web Tools for Election Administration
Predicting Discussions on the Social Semantic Web
Acm tist-v3 n4-tist-2010-11-0317
Twitter analytics
Murpha11
AI in between online and offline discourse - and what has ChatGPT to do with ...
The Right Metrics for Generation Open [Open Access Week 2014]
Echo Chamber? What Echo Chamber? Reviewing the Evidence
Surveying Our Landscape From Top to Bottom
Building and Communicating Evidence of Effectiveness in OER through Collectiv...
Insights From Social Media
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
Frontiers of Computational Journalism week 3 - Information Filter Design
NLP journal paper
Twitter analytics -digiworldhanoi.vn
Conversation Marketing: New Media Communication Strategy

More from Library and Information Science Research Coalition (20)

PPTX
Research into practice: library and information research resources briefing
PPTX
Research into practice: The present situation
PPTX
DREaM 5: One minute madness 2012
PPTX
DREaM 5: Library and information science practitioner researcher excellence a...
PDF
DREaM 5: DREaM past, present and future
PPTX
DREaM 5: Building evidence of the value and impact of library information ser...
PPT
We have a DREaM: the Developing Research Excellence & Methods network
PPTX
Presentation on the RiLIES projects at QQML2012
PPT
Kevin Swingler: Introduction to Data Mining
PPTX
Dr Phil Turner: Techniques from Psychology
PPTX
Dr Harry Woodroof: Introduction to Horizon Scanning
PPT
Nick Moore: Making the bullets for others to fire (research and policy)
PPTX
Thomas Haigh: Techniques from History
DOCX
Thomas Haigh: DREaM workshop 2 task
PPTX
Strengthening the links between research and practice: the Research in Librar...
PPTX
LIS DREaM 2: Social Network Analysis Workshop Exercise Results
DOC
DREaM Event 2: Charles Oppenheim (Handout)
Research into practice: library and information research resources briefing
Research into practice: The present situation
DREaM 5: One minute madness 2012
DREaM 5: Library and information science practitioner researcher excellence a...
DREaM 5: DREaM past, present and future
DREaM 5: Building evidence of the value and impact of library information ser...
We have a DREaM: the Developing Research Excellence & Methods network
Presentation on the RiLIES projects at QQML2012
Kevin Swingler: Introduction to Data Mining
Dr Phil Turner: Techniques from Psychology
Dr Harry Woodroof: Introduction to Horizon Scanning
Nick Moore: Making the bullets for others to fire (research and policy)
Thomas Haigh: Techniques from History
Thomas Haigh: DREaM workshop 2 task
Strengthening the links between research and practice: the Research in Librar...
LIS DREaM 2: Social Network Analysis Workshop Exercise Results
DREaM Event 2: Charles Oppenheim (Handout)

Recently uploaded (20)

PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Statistics on Ai - sourced from AIPRM.pdf
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PPTX
future_of_ai_comprehensive_20250822032121.pptx
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Improvisation in detection of pomegranate leaf disease using transfer learni...
Module 1 Introduction to Web Programming .pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Statistics on Ai - sourced from AIPRM.pdf
Early detection and classification of bone marrow changes in lumbar vertebrae...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
future_of_ai_comprehensive_20250822032121.pptx
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Basics of Cloud Computing - Cloud Ecosystem
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
4 layer Arch & Reference Arch of IoT.pdf
The influence of sentiment analysis in enhancing early warning system model f...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Flame analysis and combustion estimation using large language and vision assi...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx

Mike Thelwall: Introduction to Webometrics

  • 1. Introduction to Webometrics Mike Thelwall @mikethelwall Professor of Information Science, Statistical Cybermetrics Research Gp, University of Wolverhampton
  • 2. Reminder of pre-workshop task Delegates were asked to join YouTube and leave comments and replies to earlier comments on the video: Department of Library and Information Science, Delhi http://www.youtube.com/watch?v=_-OAsF9uRfc These contributions will form part of the discussion at the end of the session, and include reference to the self-declared age and gender information from YouTube.
  • 3. Overview of “Webometrics” What is Webometrics? Gathering, processing and analysing large scale data from the web (web pages, hyperlinks, blogs, Web 2.0) for many purposes that include online communication What can Webometrics offer other researchers? Software to gather data from web sites, search engines, social network sites and blogs; methods to extract useful patterns Common data sources Webometric Analyst software: Twitter, YouTube, the Web, Technorati, Bing Bespoke: Any resource with an API, page scraping of other sites, SocSciBot web crawler http://lexiurl.wlv.ac.uk
  • 4. 1. Background: Webometrics Webometrics is about gathering data on the Web, and measuring aspects of the Web: web sites web pages hyperlinks YouTube video commenter networks web search engine results MySpace Friend networks Twitter or blog trends … for varied social science purposes
  • 5. New problems: Web-based phenomena Webometrics can analyse online academic communication Why do academic web sites interlink? Which academic web sites interlink? What academic interlinking patterns exist? Which web sites/groups/documents have the most online impact, and why?
  • 6. Old problems: Offline phenomena reflected online Some offline phenomena have measurable online reflections International communication Inter-university collaboration University-business collaboration The impact or spread of ideas Public opinion about science
  • 7. Example: The online impact of research groups (NetReAct)
  • 8. Normalised linking, smallest countries removed Geopolitical connected Sweden Finland Norway UK Germany Austria Switzerland Poland Italy Belgium Spain France NL Example: Links between EU universities
  • 10. Data Gathering/Processing Tools Webometric Analyst – web citations, web text, YouTube, Flickr, Technorati Submits thousands of queries to Bing and summarises the results in standard ways SocSciBot – links, web text Web Crawler & analyser
  • 11. 2. altmetrics in traditional research evaluation Altmetrics can supplement traditional citation impact with non-traditional online impact E.g., educational, discussion-based Often weaker than citation data but useful for research groups that have non-standard types of impacts
  • 12. The Integrated Online Impact Indicator (IOI) Combines a range of online sources into one indicator Google Scholar + Google Books + Course reading lists + Google Blogs + PowerPoint presentations = IOI OR select individual separate components Invented by Kayvan Kousha
  • 13. New source 1: Google Scholar Wider evidence of academic impact Wider types of academic publications, some non-academic publications Not reliable Coverage variable Can’t be automatically queried Free
  • 14. New source 2: Google Books Books typically not indexed in WoS or Scopus Relevant in book-based disciplines (arts, humanities, some social sciences) Reliability unknown but probably not good Coverage variable Can be automatically queried Free [ Clifford Lynch ]
  • 15. New source 3: Course reading lists Evidence of educational impact Can automatically construct queries to detect individual articles in online syllabuses Get results via advanced Google/Yahoo/Live Search queries Works for most articles Fails for short common article titles
  • 16. New source 4: Blogs Evidence of impact on discussions Educational impact, public dissemination evidence, academic impact in discursive subjects? Not possible to automate in the largest database (Google Blogs)? Not a well researched area
  • 17. New source 5: PowerPoint Presentations Evidence of educational/scholarly impact Especially relevant for discursive subjects? Automated Live Search/Yahoo advanced queries IOI = a*Scholar + b*PowerPoint + c*Blogs + d* Syllabus + e* Books Or use qualitative analyses of the different sources
  • 18. 2. Sentiment Strength Detection in the Social Web with SentiStrength Detect positive and negative sentiment strength in short informal text Develop workarounds for lack of standard grammar and spelling Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!) Classify simultaneously as positive 1-5 AND negative 1-5 sentiment Thelwall, M., Buckley, K., & Paltoglou, G. (in press).  Sentiment strength detection for the social Web . Journal of the American Society for Information Science and Technology . Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010).  Sentiment strength detection in short informal text . Journal of the American Society for Information Science and Technology , 61(12), 2544-2558.
  • 19. SentiStrength Algorithm - Core List of 2,489 positive and negative sentiment term stems and strengths (1 to 5), e.g. ache = -2, dislike = -3, hate=-4, excruciating -5 encourage = 2, coolest = 3, lover = 4 Sentiment strength is highest in sentence; or highest sentence if multiple sentences
  • 20. My legs ache. You are the coolest. I hate Paul but encourage him. -2 3 -4 2 1, -2 positive, negative 3, -1 2, -4
  • 21. Extra sentiment methods spelling correction nicce -> nice booster words alter strength very happy negating words flip emotions not nice repeated letters boost sentiment/+ve niiiice emoticon list :) =+2 exclamation marks count as +2 unless –ve hi! repeated punctuation boosts sentiment good!!! negative emotion ignored in questions u h8 me? Sentiment idiom list shock horror = -2 Online as http://sentistrength.wlv.ac.uk/
  • 22. Tests against human coders SentiStrength agrees with humans as much as they agree with each other 1 is perfect agreement, 0 is random agreement Data set Positive scores -correlation with humans Negative scores -correlation with humans YouTube 0.589 0.521 MySpace 0.647 0.599 Twitter 0.541 0.499 Sports forum 0.567 0.541 Digg.com news 0.352 0.552 BBC forums 0.296 0.591 All 6 data sets 0.556 0.565
  • 23. Why the bad results for BBC? (and Digg) Irony, sarcasm and expressive language e.g., David Cameron must be very happy that I have lost my job. It is really interesting that David Cameron and most of his ministers are millionaires. Your argument is a joke . $
  • 24. 2. Twitter – sentiment in major media events Analysis of a corpus of 1 month of English Twitter posts (35 Million, from 2.7M accounts) Automatic detection of spikes (events) Assessment of whether sentiment changes during major media events
  • 25. Automatically-identified Twitter spikes 9 Mar 2010 9 Feb 2010 Proportion of tweets mentioning keyword Thelwall, M., Buckley, K., & Paltoglou, G. (2011).  Sentiment in Twitter events .  Journal of the American Society for Information Science and Technology,  62(2), 406-418.
  • 26. Chile matching posts Sentiment strength Subj. Increase in –ve sentiment strength 9 Feb 2010 9 Feb 2010 Date and time Date and time 9 Mar 2010 9 Mar 2010 Av. +ve sentiment Just subj. Av. -ve sentiment Just subj. Proportion of tweets mentioning Chile
  • 27. #oscars % matching posts Sentiment strength Subj. Increase in –ve sentiment strength Date and time Date and time 9 Feb 2010 9 Feb 2010 9 Mar 2010 9 Mar 2010 Av. +ve sentiment Just subj. Av. -ve sentiment Just subj. Proportion of tweets mentioning the Oscars
  • 28. Sentiment and spikes Statistical analysis of top 30 events: Strong evidence that higher volume hours have stronger negative sentiment than lower volume hours No evidence that higher volume hours have different positive sentiment strength than lower volume hours => Spikes are typified by small increases in negativity
  • 29. 3. YouTube Video comments 1000 comm. per video via Webometric Analyst (or the YouTube API) Good source of social web text data Analysis of all comments on a pseudo-random sample of 35,347 videos with < 1000 comments
  • 30.  
  • 31. Using Webometric Analyst Download free from http://lexiurl.wlv.ac.uk Start, select classic interface, YouTube Tab
  • 32. Reply networks Illustrate the replies to a YouTube video in network form Reveal age and gender of posters Reveal patterns of discussion in the replies (if any) Take up to 25 minutes to make per video with Webometric Analyst
  • 33. Reply network Extended core interactions 2x2=5 video Nodes (people) blue = male pink = female Arrows (replies) red = happy replies black = angry replies
  • 34. The 10 most ridiculous Black Metal videos A very sparse reply network Nodes are mostly connected in 2s and 3s
  • 35. Black Metal vs. Deathcore Denser reply network On-going debates and contentious issues
  • 36. Other networks Connections can be Friendship in YouTube (must be reciprocal) Subscription in YouTube (non-reciprocal, based upon interest in video content) Friends in common in YouTube suggests factors in common (e.g., bands) rather than people in common Subscriptions in common in YouTube again suggests factors in common (e.g., bands) rather than people in common
  • 37. Very sparse Friend network = common
  • 38. Common Friends network – with a densely connected core
  • 39. Large-scale analysis of YouTube Purpose: to discover patterns, norms and unusual behaviour in YouTube Method: Generate a large sample of YouTube videos Running searches for many terms from a large word list Selecting a video at random from each set of results Extract properties of the videos and commenters Calculate averages and distributions Examine extreme videos
  • 41.  
  • 42. Controversial and non-controversial ?
  • 43. Conclusions Sentiment analysis allows large-scale social web analysis YouTube videos can be analysed easily with a network perspective on their comments http://lexiurl.wlv.ac.uk/ - Webometric Analyst Can make a reply network of the discussions of video _-OAsF9uRfc by following the instructions at http://lexiurl.wlv.ac.uk/searcher/youtubereplies.html
  • 44.  

Editor's Notes

  • #31: Example of highly popular amateur video triggering lots of discussions
  • #32: Webometric Analyst uses the YouTube API to download information on one or more videos. [Is Windows only]
  • #38: Friend network
  • #39: Friends in common
  • #42: Intelligent design debate
  • #43: Arjona link