0% found this document useful (0 votes)

59 views67 pages

Twilio XML Schema Overview

Uploaded by

skullman830

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views67 pages

Twilio XML Schema Overview

Uploaded by

skullman830

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DS 5110 – Lecture 6

Semi-Structured Data

Roi Yehoshua
Agenda
 HTML
 Web scraping
 XML
 JSON
 Web API, RESTful services

2 Roi Yehoshua, 2023

Semi-Structured Data
 Data that doesn’t conform to a data model, but has some structure
 A self-describing structure
 Contains tags that describe the data
 No separation between data and schema
 Examples of semi-structured data:
 HTML pages
 XML
 JSON
 E-mails
 TCP/IP packets

3 Roi Yehoshua, 2023

Semi-Structured Data
 Pros
 Flexible schema, can be easily changed
 Data is portable
 Can be used to exchange data between different databases
 Support for nested or hierarchical data
 Cons
 Interpretation of the relationships in the data is more difficult
 Queries are less efficient as compared to the relational model
 Storage cost is higher
 Cannot define constrains on the data

4 Roi Yehoshua, 2023

HTML
 HTML is the standard language for creating Web pages
 Consists of a series of elements which tell the browser how to display the content
 An HTML element is defined by a start tag, some content, and an end tag:
<tagname>Content goes here...</tagname>

 Some HTML elements have no content (like the <br> element)

 These elements are called empty elements
 Elements can have attributes that provide additional information about the element
 Attributes are specified in the start tag of the element
 For example, the href attribute of <a> specifies the URL of the page the link goes to:
<a href="[Link] google</a>

5 Roi Yehoshua, 2023

HTML Example
<!DOCTYPE html>

<html lang="en" xmlns="[Link]

<head>
<meta charset="utf-8" />
<title>Sample Page</title>
</head>
<body>
<h1 style="color:darkcyan">Header</h1>

6 Roi Yehoshua, 2023

DOM (Document Object Model)
 The Document Object Model (DOM) is a programming interface for web documents
 Used mainly for XML, HTML and SVG documents
 Represents the document as a tree of nodes, known as the DOM tree
 DOM methods allow programmatic access to the tree

<!DOCTYPE html>
<html lang="en">
<head>
<title>My Document</title>
</head>
<body>
<h1>Header</h1>
<p>Paragraph</p>
</body>
</html>

7 Roi Yehoshua, 2023

Web Scraping
 Web scraping is a technique of extracting information from websites
 It is used to transform unstructured data (HTML format) into structured data
(database or spreadsheet)
 Web scraping is legal if you scrape data publicly available on the web
 Be careful scraping personal data, intellectual property or confidential data
 In any case, you should check out the Terms of Service of the website before scraping its data

8 Roi Yehoshua, 2023

Web Scraping
 You need two Python libraries for scraping data:
 Requests: used for fetching web pages from URLs
 BeautifulSoup: a library for pulling data of HTML and XML files
 These packages are included with Anaconda distribution
 To install them manually you can use pip install:
pip install requests
pip install beautifulsoup4

9 Roi Yehoshua, 2023

Web Scraping
 The following example extracts data about US states from Wikipedia
[Link]

10 Roi Yehoshua, 2023

Loading the Page Content
 First, we need to retrieve the HTML of the page using the requests library:

11 Roi Yehoshua, 2023

Parsing the Page Content
 Beautiful Soup is a python library for parsing structured data
 To parse the HTML content use the following code:

 The second argument, "[Link]", makes sure that you use the appropriate parser for the
HTML content

12 Roi Yehoshua, 2023

Navigating using Tag Names
 The simplest way to navigate the parsed tree is to use the name of the tag you want
 For example, if you want the <title> tag, just say [Link]:

 To get only the text content of the tag use its text attribute:

 You can use the dot multiple time to zoom in on a deeper part of the tree:

 Using a tag name as an attribute will give you only the first tag by that name

13 Roi Yehoshua, 2023

Inspecting Elements
 For easier viewing, you can prettify any Beautiful Soup object when you print it out

14 Roi Yehoshua, 2023

Navigation Attributes
 Each tag has a few attributes that allow you to traverse
 .contents – a list of the tag’s children
 .children – an iterable over the tag’s children
 .descendants – an iterable over all of the tag’s descendants
 .parent – the element’s parent
 .next_sibling – the next sibling of the element on the same level
 .previous_sibling – the previous sibling of the element on the same level

15 Roi Yehoshua, 2023

Searching the Tree
 The two most common methods for searching the tree are:
 find(name, attrs) – retrieves the first tag that matches your filters
 findall(name, attrs) – retrieves all the tags that match your filters
 The name argument searches for only tags with certain names
 Any argument that’s not recognized becomes a filter on one of the tag’s attributes
 For example, let’s find all the link elements in the page with the CSS class 'image':

16 Roi Yehoshua, 2023

Searching the Tree
 You can also call the find() and find_all() methods on a specific element
 This will search only inside the subtree rooted at that element
 For example, let’s search for links inside the table

17 Roi Yehoshua, 2023

Searching for Strings
 With the string argument you can search for strings instead of tags:

 You can also pass a regular expression to it:

18 Roi Yehoshua, 2023

Attributes
 You can access a tag’s attributes by treating the tag like a dictionary:

 You can access the attributes dictionary directly as .attrs:

19 Roi Yehoshua, 2023

Scraping the Wikipedia Page
 Let’s now extract the information we need from the US states Wikipedia page
 We first locate the states table as the second table in the page:

20 Roi Yehoshua, 2023

Scraping the Wikipedia Page
 We need to skip the first two rows of the table that contain the headers

21 Roi Yehoshua, 2023

Scraping the Wikipedia Page
 By inspecting the structure of each row we find the location of the data we need
 The name of the state is in the first <th> tag, inside an <a> element
 The population size is in the fifth <td> element
 The area size is in the sixth <td> element
 However, in states where the capital = largest city, the location of the data changes
 The population size is in the fourth <td> element
 The area size is in the fifth <td> element>
 We can identify these states by checking if the second <td> element has colspan="2"

22 Roi Yehoshua, 2023

Scraping the Wikipedia Page
 The function to get the states data:

23 Roi Yehoshua, 2023

Scraping the Wikipedia Page
 We can now build a DataFrame from this data:

24 Roi Yehoshua, 2023

XML
 XML: Extensible Markup Language <purchase_order>
<id>P-101</id>
 Unlike HTML: <purchaser> … </purchaser>
<itemlist>
 Designed to represent data and not UI elements <item>
 Extensible: users can define their own tags <id>RS1</id>
<description>Atom rocket sled</description>
 Mainly used for data exchange between applications <quantity>2</quantity>
<price>199.95</price>
 Unlike HTML, XML is designed to represent data
</item>
<item>
<id>SG2</id>
<description>Superb glue</description>
<quantity>1</quantity>
<unit-of-measure>liter</unit-of-measure>
<price>29.95</price>
</item>
</itemlist>
<total_cost>429.85</total_cost>
</purchase_order>

25 Roi Yehoshua, 2023

Structure of XML Data
 The building blocks of an XML document are elements
 Element: a section of data beginning with <tag> and ending with a matching </tag>
 An element can contain text, attributes, and other elements
 Elements must be properly nested
 Proper nesting
 <course> … <title> … </title> … </course>
 Improper nesting
 <course> … <title> … </course> </title>
 An empty element <tag></tag> may be abbreviated as <tag/>
 Every document must have a single root element that contains all other elements

26 Roi Yehoshua, 2023

Attributes
▪ Elements can have attributes
<course course_id="CS-101">
<title>Intro. to Computer Science</title>
<dept_name>Comp. Sci.</dept_name>
<credits>4</credits>
</course>
▪ Attributes are specified by name=value pairs inside the starting tag of an element
▪ An element may have several attributes, but each attribute can only occur once
<course course_id="CS-101" credits="4">

27 Roi Yehoshua, 2023

Attributes vs. Subelements
 In the context of documents, attributes are part of markup, while subelement
contents are part of the basic document contents
 In the context of data representation, the distinction is less relevant
 Same information can be represented in two ways
 <course course_id="CS-101"> … </course>
 <course>
<course_id>CS-101</course_id> …
</course>
 Suggestion: use attributes for identifiers of elements, and store all other data as
subelements

28 Roi Yehoshua, 2023

Namespaces
 A namespace allows organizations to specify globally unique names for the elements
 A namespace is defined by an xmlns attribute in the start tag of the root element
 <root xmlns:prefix="URL">…</root>
 Typically, the URL of the organization’s web site is used as the namespace identifier
 The namespace is prepended to each tag or attribute in the document
 prefix:element-name
<university xmlns:yale="[Link]
…
<yale:course>
<yale:course_id>CS-101</yale:course_id>
<yale:title>Intro. to Computer Science</yale:title>
<yale:dept_name> Comp. Sci.</yale:dept_name>
<yale:credits>4</yale:credits>
</yale:course>
…
</university>
29 Roi Yehoshua, 2023
Comparison with Relational Data
 Inefficient
 Tags, which in effect represent schema information, are repeated
 Redundant storage of data
 e.g., item descriptions may be repeated in multiple purchase orders that ordered the same item
 Better than relational tuples as a data exchange format
 Unlike relational tuples, XML data is self-documenting due to the presence of tags
 Non-rigid format: tags can be added
 XML allows nested structures
 Wide acceptance, not only in database systems, but also in browsers, tools, and applications

30 Roi Yehoshua, 2023

XML Document Schema
 Database schemas constrain what information can be stored, and the data types of
stored values
 XML documents are not required to have an associated schema
 However, schemas are very important for XML data exchange
 Otherwise, a site cannot automatically interpret data received from another site
 Two mechanisms for specifying XML schema
 Document Type Definition (DTD)
 An older format
 XML Schema
 Newer format, widely used today

31 Roi Yehoshua, 2023

XML Schema
 An XML Schema describes the structure of an XML document
 Schema definitions themselves are specified in XML syntax, using a variety of tags
defined by XML Schema
 These tags are typically prefixed by the namespace xs
 Elements are specified using the <xs:element> tag
 The type of an element can be simple or complex
 XML Schema defines a number of built-in types such as string, integer, decimal and date
 e.g., <xs:element name=“dept_name” type=“xs:string”/>
 We can use the <xs:complexType> element to create named complex types
 <xs:sequence> defines the complex type as a sequence of elements
 Attributes are specified using the <xs:attribute> tag

32 Roi Yehoshua, 2023

XML Document for the University Data
<university>
<department dept_name="Comp. Sci.">
<building>Taylor</building>
<budget>100000</budget>
</department>
<department dept_name="Biology">
<building>Watson</building>
<budget>90000</budget>
</department>
<course course_id="CS-101" dept_name="Comp. Sci.">
<title>Intro. to Computer Science</title>
<credits>4</credits>
</course>
….
<instructor ID="10101" dept_name="Comp. Sci.">
<name>Srinivasan</name>
<salary>65000</salary>
<teaches>
<course course_id="CS-101"/>
….
</teaches>
</instructor>
….
</university>
33 Roi Yehoshua, 2023
XML Schema for the University Data
<xs:schema xmlns:xs=“[Link]
<xs:element name=“university” type=“universityType” />
<xs:element name=“department”>
<xs:complexType>
<xs:attribute name=“dept_name” type=“xs:string”/>
<xs:sequence>
<xs:element name=“building” type=“xs:string”/>
<xs:element name=“budget” type=“xs:decimal”/>
</xs:sequence>
</xs:complexType>
</xs:element>
….
<xs:element name=“instructor”>
<xs:complexType>
<xs:attribute name=“ID” type=“xs:string”/>
<xs:sequence>
<xs:element name=“name” type=“xs:string”/>
<xs:element name=“dept_name” type=“xs:string”/>
<xs:element name=“salary” type=“xs:decimal”/>
<xs:element name=“teaches” type=“teachesType”/>
</xs:sequence>
</xs:complexType>
</xs:element>
… Contd.
34 Roi Yehoshua, 2023
XML Schema for the University Document
….
<xs:complexType name=“teachesType”>
<xs:sequence>
<xs:element ref=“course” minOccurs=“0” maxOccurs=“unbounded”/>
</xs:sequence>
</xs:complexType>
<xs:complexType name=“UniversityType”>
<xs:sequence>
<xs:element ref=“department” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“course” minOccurs=“0” maxOccurs=“unbounded”/>
<xs:element ref=“instructor” minOccurs=“0” maxOccurs=“unbounded”/>
</xs:sequence>
</xs:complexType>
</xs:schema>

35 Roi Yehoshua, 2023

Application Program Interfaces to XML
 There are two standard APIs to XML data:
 SAX (Simple API for XML)
 Parses the XML document one bit at a time
 Provides event handlers for parsing events
 e.g., start of element, end of element

 Need to keep track of the program’s position in the document

 DOM (Document Object Model)
 Represents the XML document as a tree structure
 Provides a variety of properties and methods for traversing the DOM tree
 Also provides methods for updating the DOM tree
 Useful for random-access applications
 Supported by many programming languages with slightly different syntaxes

36 Roi Yehoshua, 2023

XML Processing in Python
 Python’s interfaces for processing XML are grouped in the xml package
 The XML handling submodules are:
 [Link]: a SAX parser
 [Link]: the DOM API definition
 [Link]: a minimal DOM implementation
 [Link]: the ElementTree API, a simple and lightweight XML processor
 A more "Pythonic" API compared to the W3C-controlled DOM

37 Roi Yehoshua, 2023

MiniDom Example

38 Roi Yehoshua, 2023

ElementTree Example

39 Roi Yehoshua, 2023

Querying and Transforming XML Data
 Translation of information from one XML schema to another
 Querying on XML data
 Above two are closely related, and handled by the same tools
 Standard XML querying/translation languages
 XPath
 Simple language consisting of path expressions
 XQuery
 An XML query language with a rich set of features
 XSLT
 Simple language designed for translation from XML to XML and XML to HTML

40 Roi Yehoshua, 2023

Tree Model of XML Data
 Query and transformation languages are based on a tree model of XML data

41 Roi Yehoshua, 2023

XPath
 XPath is a querying language for selecting nodes from an XML document
 A path expression is used to navigate and select elements from the document
 Consists of a sequence of steps separated by / or //
 / selects a child node (the first / in the path selects the root node)
 // selects all the descendant nodes (including self)
 Examples:
 /bookstore/book selects all the books
 //title selects all the title elements anywhere in the document
 /bookstore/book//title selects all the title elements anywhere under a book element

42 Roi Yehoshua, 2023

XPath Predicates
 Predicates written inside [] are used to find specific nodes in the document
 They may follow any step in the path
 Index values in predicates start from 1
 Can use Boolean operators and, or, and a function not()
 A union operator | forms the union of two node sets
 Examples:
 //book[price < 25] selects books with price less than 25
 /bookstore/book[1]/title selects the title of the first book
 //book[author='J.K. Rowling']/title selects titles of books authored by J.K. Rowling
 //book[price] selects books that have a price subelement
 //book[year > 2000 and price < 20] selects books released after 2000 with price less than 20
 //book[price > 2 * discount] selects books whose price is greater than twice their discount

43 Roi Yehoshua, 2023

XPath Attributes
 Attributes are accessed using @

 Examples:
 /bookstore/book[1]/title/@lang selects the language attribute of the first book
 //title[@lang='en'] selects title nodes that have an attribute lang with a values of 'en’
 //title[@lang] selects title nodes that have an attribute lang

44 Roi Yehoshua, 2023

XPath Functions
 XPath offers a variety of functions to filter your selections:
 Number functions: count(), sum(), round(), …
 String functions: concat(), contains(), starts-with(), substring(), …
 Boolean functions: not(), true(), false(), …
 Functions to get properties of nodes: name(), text(), position()

 Examples:
 //books/title/text() get the title of the books (without the enclosing <title> tag)
 count(//book) returns the number of books
 //book[contains(title, 'Harry')] selects books whose title contains 'Harry'
 //book[not(contains(title, 'Harry'))] selects books which don’t have 'Harry' in the title

45 Roi Yehoshua, 2023

Class Exercise
 Write XPath expressions to find the following nodes:
 Select the language of books whose price is greater than 20
 Select the title of the books that have more than one author

46 Roi Yehoshua, 2023

XPath in Python
 To run an XPath query in Python, you can use the lxml library
 pip install lxml
from lxml import etree

# Parse the XML file

root = [Link]('[Link]')

# Run the XPath query

results = [Link]('//book[price > 20]/title/text()')

# Print the results

for result in results:
print(result)

47 Roi Yehoshua, 2023

XML Applications
 Storing data with complex structure
 e.g., user preferences, configuration files
 Storing documents and spreadsheet data
 e.g., Open Document Format (ODF) for storing Open Office documents is based on XML
 Numerous other standards for a variety of applications
 e.g., ChemML, MathML
 Exchanging data between different parts of the application
 Standard for data exchange for web services
 Remote method invocation over HTTP protocol
 XML is used to represent method input and output
 Data mediation
 Common data representation format to bridge different systems
48 Roi Yehoshua, 2023
JSON
 JavaScript Object Notation
 Textual representation widely used for data exchange
 Lightweight compared to XML
 Almost no parsing required
 Supported by many programming languages

49 Roi Yehoshua, 2023

JSON Syntax
 JSON closely resembles the syntax of JavaScript object literal
 JSON is built on two structures:
 Objects (a collection of key/value pairs)
 Arrays (an ordered list of values)
 Supported primitive types
 Numbers
 Strings
 Booleans
 null
 Property names (keys) must be strings
 Allows only double-quoted strings

50 Roi Yehoshua, 2023

Processing JSON in Python
 The json package provides functions for encoding and decoding JSON data
 Main functions:
Function Description
[Link](obj, file) Serialize obj as a JSON formatted stream to file
[Link](obj) Serialize obj to a JSON string
[Link](file) Deserialize file containing a JSON document to a Python object
[Link](s) Deserialize string s to a Python object

 Objects in the JSON document are converted into Python dictionaries

 Arrays in the JSON document are converted into Python lists

51 Roi Yehoshua, 2023

JSON Document for the University Data
{
"departments": [
{
"dept_name": "Comp. Sci.",
"building": "Taylor",
"budget": 100000
},
{
"dept_name": "Biology",
"building": "Watson",
"budget": 90000
},
...
],
"courses": [
{
"course_id": "CS-101",
"dept_name": "Comp. Sci.",
"title": "Intro. to Computer Science",
"credits": 4
},
...
]

52 Roi Yehoshua, 2023

JSON Document for the University Data
"instructors": [
{
"ID": "10101",
"dept_name": "Comp. Sci.",
"name": "Srinivasan",
"salary": 65000,
"teaches": ["CS-101", "CS-315", "CS-347"]
},
{
"ID": "83821",
"dept_name": "Comp. Sci.",
"name": "Brandt",
"salary": 92000,
"teaches": ["CS-190", "CS-319"]
},
...
]
}

53 Roi Yehoshua, 2023

Reading the Document in Python

54 Roi Yehoshua, 2023

Loading JSON into a DataFrame
 You can pass a JSON object (dictionary) directly to the DataFrame constructor

55 Roi Yehoshua, 2023

Web APIs
 Web APIs are services provided by web sites that allow to query their content
 These services can be accessed from different platforms
 e.g., web pages, desktop/mobile applications
 Most web APIs support both XML and JSON formats

 To use the API, you need to make an HTTP request for a specific URL
 They usually require API keys
 These protect the API vendor from malicious use of the service
 You must apply to get a key, and include it in your code to access the API functionality

56 Roi Yehoshua, 2023

Web APIs
 Common web APIs
 Google suite of APIs enable you to communicate with various Google services
 e.g., Google Search, Google Translate, Google Maps, Gmail, etc.
 Facebook suite of APIs enables you to use various parts of the Facebook ecosystem
 e.g., providing app login using Facebook login, accepting in-app payments, etc.
 Twitter API allows you to embed Twitter data on your site, e.g., your latest tweets
 Map APIs like MapQuest and Google Maps API allow you to do things with maps
 Telegram APIs allow you to embed content from Telegram channels on your site
 YouTube API allows you to embed YouTube videos on your site, search YouTube, etc.
 Pinterest API provides tools to manage Pinterest boards and pins
 Twilio API provides frameworks for building voice and video call functionality

57 Roi Yehoshua, 2023

REST Architecture
 One of the most popular ways to build server APIs is the REST architectural style
 REST stands for representational state transfer
 Defines an architectural pattern for communication between client and server
 Defines the following architectural constraints:
 Uniform interface – the server provides a uniform interface for accessing resources
 Client-server – the client and the server must be decoupled from each other
 Stateless – the server won’t maintain any state between requests from the client
 Cacheable – the data retrieved from the server should be cacheable by the client or the server
 Layered system – the client may access the server resources indirectly through other layers
such as a proxy or load balancer
 Code on demand (optional) – the server may transfer code to the client that it can run

58 Roi Yehoshua, 2023

RESTful Web Services
 Web services that follow the REST style are known as RESTful web services
 These web services expose their data through public URLs
 e.g., the URL for the GitHub REST API is [Link]
 You access the data by sending an HTTP request to that URL

59 Roi Yehoshua, 2023

API Endpoints
 A REST API exposes a set of public URLs that map to different actions on the server
 These URLs are called endpoints
 For example, a web service for product management may have the following APIs:
HTTP Method API Endpoint Description
GET /products Get a list of products
GET /products/<product_id> Get a single product
POST /products Create a new product
PUT /product/<product_id> Update a product
DELETE /product/<product_id> Delete a product

60 Roi Yehoshua, 2023

Example: Google Books API
 The Google Books API allows clients to access the Google Books repository
 A Volume represents information about a book or a magazine
 Contains metadata, such as title, authors, publisher
 Also includes personalized data, such as whether or not it has been purchased
 To get information about volumes, you can use one of the following GET requests

 These methods apply to the public data about volumes and do not require authentication

61 Roi Yehoshua, 2023

Example: Google Books API
 For example, to search for books that contain the word Data Science
 [Link]

62 Roi Yehoshua, 2023

Example: Google Books API
 You can also try the method directly from that page

63 Roi Yehoshua, 2023

Getting Data from URLs
 The requests module allows you to fetch data from URLs
 [Link](url) sends an HTTP request to the specified URL and gets an HTTP
response with all the data (status, headers, content, etc.)

64 Roi Yehoshua, 2023

Loading Data From JSON
 The json module provides functions for encoding and decoding JSON data
 [Link](file) converts a JSON file into a Python object (dictionary or list)
 [Link](s) converts a JSON string into a Python object

65 Roi Yehoshua, 2023

Loading Data From JSON
 Finally, we can create a DataFrame from the dictionary we obtained from the JSON:

66 Roi Yehoshua, 2023

Loading Data From JSON
 To flatten the JSON, we can use the function json_normalize() from [Link]:

67 Roi Yehoshua, 2023

Lecture 5 - Semi-Structured Data
No ratings yet
Lecture 5 - Semi-Structured Data
66 pages
Web Scraping Cheat Sheet 2021
100% (3)
Web Scraping Cheat Sheet 2021
26 pages
Introduction to XML Basics
No ratings yet
Introduction to XML Basics
44 pages
Web Scraping Basics with Python
No ratings yet
Web Scraping Basics with Python
4 pages
Web Scraping Basics with Python
No ratings yet
Web Scraping Basics with Python
4 pages
Understanding Resource Description Framework (RDF)
100% (1)
Understanding Resource Description Framework (RDF)
22 pages
XPath vs XQuery Overview
No ratings yet
XPath vs XQuery Overview
93 pages
Web Scraping with Python Basics
No ratings yet
Web Scraping with Python Basics
6 pages
Ultimate XPath Cheatsheet
No ratings yet
Ultimate XPath Cheatsheet
10 pages
HKU - 7001 - 4. Web Scraping
No ratings yet
HKU - 7001 - 4. Web Scraping
73 pages
Web Authoring and XML Fundamentals
No ratings yet
Web Authoring and XML Fundamentals
31 pages
Data Exchange with XML and JSON
No ratings yet
Data Exchange with XML and JSON
54 pages
Data Science: CS109 Overview
No ratings yet
Data Science: CS109 Overview
36 pages
Data Munging Techniques in CS109
No ratings yet
Data Munging Techniques in CS109
36 pages
Data Science: Web Scraping & Cleanup
No ratings yet
Data Science: Web Scraping & Cleanup
36 pages
Web Scraping: History and Tools
No ratings yet
Web Scraping: History and Tools
34 pages
RDF and XML Technologies Overview
No ratings yet
RDF and XML Technologies Overview
16 pages
HTML Standard - 250111 - 131947
No ratings yet
HTML Standard - 250111 - 131947
73 pages
03 Web Scraping
No ratings yet
03 Web Scraping
7 pages
Web Scraping
No ratings yet
Web Scraping
7 pages
Understanding XPath for XML Navigation
No ratings yet
Understanding XPath for XML Navigation
50 pages
XML Syntax and Structure Explained
No ratings yet
XML Syntax and Structure Explained
48 pages
HTML Standard
No ratings yet
HTML Standard
32 pages
Web Scraping and Data Analysis in Python
No ratings yet
Web Scraping and Data Analysis in Python
109 pages
Introduction to XPath Basics
No ratings yet
Introduction to XPath Basics
12 pages
Python Web Scraping with BeautifulSoup
No ratings yet
Python Web Scraping with BeautifulSoup
6 pages
XML DTD and Schema Overview
No ratings yet
XML DTD and Schema Overview
32 pages
HTML Documentation for Web Students
No ratings yet
HTML Documentation for Web Students
12 pages
Understanding XML Basics and Structure
No ratings yet
Understanding XML Basics and Structure
29 pages
XML Basics: Structure and Syntax Guide
No ratings yet
XML Basics: Structure and Syntax Guide
77 pages
XSLT and XPath Fundamentals Explained
No ratings yet
XSLT and XPath Fundamentals Explained
8 pages
Web Scraping with Python: BeautifulSoup
No ratings yet
Web Scraping with Python: BeautifulSoup
109 pages
Web Databases and XML Essentials
No ratings yet
Web Databases and XML Essentials
13 pages
Improve Website SEO and Accessibility
No ratings yet
Improve Website SEO and Accessibility
10 pages
Using Web Services: Python For Informatics: Exploring Information
No ratings yet
Using Web Services: Python For Informatics: Exploring Information
57 pages
Overview of XSL, XForms, and XHTML
No ratings yet
Overview of XSL, XForms, and XHTML
9 pages
XML and HTML Data Formats Explained
No ratings yet
XML and HTML Data Formats Explained
37 pages
HTML Basics for Web Scraping Guide
No ratings yet
HTML Basics for Web Scraping Guide
7 pages
Web Scraping Basics with Python
No ratings yet
Web Scraping Basics with Python
4 pages
XML Databases: Overview and Techniques
No ratings yet
XML Databases: Overview and Techniques
11 pages
XPath Selector Cheatsheet
No ratings yet
XPath Selector Cheatsheet
8 pages
XPath Tutorial for Beginners
No ratings yet
XPath Tutorial for Beginners
32 pages
Understanding XML and Web Data Representation
No ratings yet
Understanding XML and Web Data Representation
15 pages
XML Basics and DTD Overview
No ratings yet
XML Basics and DTD Overview
21 pages
XML and Web Services in .NET Guide
No ratings yet
XML and Web Services in .NET Guide
21 pages
Data Collection Techniques in Python
No ratings yet
Data Collection Techniques in Python
40 pages
Xpath
No ratings yet
Xpath
40 pages
Understanding the Semantic Web Basics
No ratings yet
Understanding the Semantic Web Basics
57 pages
XPath Contains Syntax Explained
No ratings yet
XPath Contains Syntax Explained
37 pages
Module-1: Introduction To HTML and Introduction To Css 1.1. What Is HTML and Where Did It Come From?
No ratings yet
Module-1: Introduction To HTML and Introduction To Css 1.1. What Is HTML and Where Did It Come From?
41 pages
Understanding the Semantic Web Framework
No ratings yet
Understanding the Semantic Web Framework
60 pages
JSON vs XML: Key Differences Explained
No ratings yet
JSON vs XML: Key Differences Explained
83 pages
XML Basics for Internet Databases
No ratings yet
XML Basics for Internet Databases
71 pages
02.5 4D2b Navigating An XML Document (1 Lecture) - Extended
No ratings yet
02.5 4D2b Navigating An XML Document (1 Lecture) - Extended
40 pages
XML Basics and Website Publishing Guide
No ratings yet
XML Basics and Website Publishing Guide
29 pages
Unit 3 Part 2
No ratings yet
Unit 3 Part 2
30 pages
Python XML Processing with lxml
No ratings yet
Python XML Processing with lxml
56 pages
XSPath: Querying XML Schemas Efficiently
No ratings yet
XSPath: Querying XML Schemas Efficiently
5 pages
HTML5 for Rich Internet Applications
No ratings yet
HTML5 for Rich Internet Applications
25 pages
jQuery Overview and Usage Guide
No ratings yet
jQuery Overview and Usage Guide
40 pages
Game Center Application Startup Log
No ratings yet
Game Center Application Startup Log
15 pages
GameCenter Startup Process Log
No ratings yet
GameCenter Startup Process Log
10 pages
Startup Process Log for Game Center
No ratings yet
Startup Process Log for Game Center
19 pages
Chhattisgarh University Exam Results
No ratings yet
Chhattisgarh University Exam Results
1 page
VFS Appointment Date Selector Script
67% (3)
VFS Appointment Date Selector Script
4 pages
Sensebot Log Analysis: Geetest Events
No ratings yet
Sensebot Log Analysis: Geetest Events
4 pages
Setting Up a DocBook Tool Chain
No ratings yet
Setting Up a DocBook Tool Chain
2 pages
JavaScript Asset File Overview
No ratings yet
JavaScript Asset File Overview
13 pages
Introduction to HTML Basics
No ratings yet
Introduction to HTML Basics
55 pages
Golden Thai SPA Site Audit Summary
No ratings yet
Golden Thai SPA Site Audit Summary
2 pages
Understanding XLink and XPointer in XML
No ratings yet
Understanding XLink and XPointer in XML
111 pages
Web Page Design with HTML Examples
No ratings yet
Web Page Design with HTML Examples
45 pages
Introduction to XML Structure and Rules
No ratings yet
Introduction to XML Structure and Rules
30 pages
Web Technology Exam Paper 2019
No ratings yet
Web Technology Exam Paper 2019
1 page
W3C Membership and Enrollment Guide
No ratings yet
W3C Membership and Enrollment Guide
4 pages
Connectivity Change and APK Errors
No ratings yet
Connectivity Change and APK Errors
5 pages
Comprehensive Web Design Course Outline
0% (1)
Comprehensive Web Design Course Outline
4 pages
HTML and CSS Coding Techniques
No ratings yet
HTML and CSS Coding Techniques
23 pages
GameCenter App Initialization Logs
No ratings yet
GameCenter App Initialization Logs
4 pages
Document Reader Configuration Guide
50% (2)
Document Reader Configuration Guide
4 pages
Well-Formed XML Document Exercises
No ratings yet
Well-Formed XML Document Exercises
2 pages
GameCenter Startup Log Analysis
No ratings yet
GameCenter Startup Log Analysis
27 pages
Semantic Web Technologies Overview
No ratings yet
Semantic Web Technologies Overview
16 pages
Using POST Requests with Rest Assured
No ratings yet
Using POST Requests with Rest Assured
6 pages
GameCenter Initialization Log Analysis
No ratings yet
GameCenter Initialization Log Analysis
12 pages
Game Center Application Log Insights
No ratings yet
Game Center Application Log Insights
5 pages
Blogger Template Style Guide
No ratings yet
Blogger Template Style Guide
10 pages

Twilio XML Schema Overview

Uploaded by

Twilio XML Schema Overview

Uploaded by

DS 5110 – Lecture 6

2 Roi Yehoshua, 2023

3 Roi Yehoshua, 2023

4 Roi Yehoshua, 2023

 Some HTML elements have no content (like the <br> element)

5 Roi Yehoshua, 2023

<html lang="en" xmlns="[Link]

6 Roi Yehoshua, 2023

7 Roi Yehoshua, 2023

8 Roi Yehoshua, 2023

9 Roi Yehoshua, 2023

10 Roi Yehoshua, 2023

11 Roi Yehoshua, 2023

12 Roi Yehoshua, 2023

13 Roi Yehoshua, 2023

14 Roi Yehoshua, 2023

15 Roi Yehoshua, 2023

16 Roi Yehoshua, 2023

17 Roi Yehoshua, 2023

 You can also pass a regular expression to it:

18 Roi Yehoshua, 2023

 You can access the attributes dictionary directly as .attrs:

19 Roi Yehoshua, 2023

20 Roi Yehoshua, 2023

21 Roi Yehoshua, 2023

22 Roi Yehoshua, 2023

23 Roi Yehoshua, 2023

24 Roi Yehoshua, 2023

25 Roi Yehoshua, 2023

26 Roi Yehoshua, 2023

27 Roi Yehoshua, 2023

28 Roi Yehoshua, 2023

30 Roi Yehoshua, 2023

31 Roi Yehoshua, 2023

32 Roi Yehoshua, 2023

35 Roi Yehoshua, 2023

 Need to keep track of the program’s position in the document

36 Roi Yehoshua, 2023

37 Roi Yehoshua, 2023

38 Roi Yehoshua, 2023

39 Roi Yehoshua, 2023

40 Roi Yehoshua, 2023

41 Roi Yehoshua, 2023

42 Roi Yehoshua, 2023

43 Roi Yehoshua, 2023

44 Roi Yehoshua, 2023

45 Roi Yehoshua, 2023

46 Roi Yehoshua, 2023

# Parse the XML file

# Run the XPath query

# Print the results

47 Roi Yehoshua, 2023

49 Roi Yehoshua, 2023

50 Roi Yehoshua, 2023

 Objects in the JSON document are converted into Python dictionaries

51 Roi Yehoshua, 2023

52 Roi Yehoshua, 2023

53 Roi Yehoshua, 2023

54 Roi Yehoshua, 2023

55 Roi Yehoshua, 2023

56 Roi Yehoshua, 2023

57 Roi Yehoshua, 2023

58 Roi Yehoshua, 2023

59 Roi Yehoshua, 2023

60 Roi Yehoshua, 2023

61 Roi Yehoshua, 2023

62 Roi Yehoshua, 2023

63 Roi Yehoshua, 2023

64 Roi Yehoshua, 2023

65 Roi Yehoshua, 2023

66 Roi Yehoshua, 2023

67 Roi Yehoshua, 2023

You might also like