
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Parse URLs into Components in Python
This module provides a standard interface to break Uniform Resource Locator (URL) strings in components or to combine the components back into a URL string. It also has functions to convert a "relative URL" to an absolute URL given a "base URL."
This module supports the following URL schemes -
- file
- ftp
- gopher
- hdl
- http
- https
- imap
- mailto
- mms
- news
- nntp
- prospero
- rsync
- rtsp
- rtspu
- sftp
- shttp
- sip
- sips
- snews
- svn
- svn+ssh
- telnet
- wais
- ws
- wss
urlparse()
This function parses a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL. Each tuple item is a string. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The return value is an instance of a subclass of tuple made up of following attributes:
Attribute | Index | Value | Value if not present |
---|---|---|---|
scheme | 0 | URL scheme specifier | scheme parameter |
netloc | 1 | Network location part | scheme parameter |
path | 2 | Hierarchical path | empty string |
params | 3 | Parameters for last path element | empty string |
query | 4 | Query component | empty string |
fragment | 5 | Fragment identifier | empty string |
username | User name | None | |
password | Password | None | |
hostname | Host name (lower case) | None | |
port | Port number as integer, if present | None |
Example
>>> from urllib.parse import urlparse >>> url = 'https://mail.google.com/mail/u/0/?tab = rm#inbox' >>> t = urlparse(url) ParseResult(scheme = 'https', netloc = 'mail.google.com', path = '/mail/u/0/', params = '', query = 'tab = rm', fragment = 'inbox')
urlunparse(parts)
This function constructs a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.
>>> from urllib.parse import urlunparse >>> urlunparse(t) 'https://mail.google.com/mail/u/0/?tab = rm#inbox' urlsplit(urlstring, scheme = '', allow_fragments = True):
This is similar to urlparse(), but does not split the params from the URL. This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).
>>> from urllib.parse import urlsplit >>> urlsplit(url) SplitResult(scheme = 'https', netloc = 'mail.google.com', path = '/mail/u/0/', query = 'tab = rm', fragment = 'inbox')
urlunsplit(parts)
This function combines the elements of a tuple as returned by urlsplit() into a complete URL as a string.
The URL quoting functions focus on taking program data and making it safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text.
quote()
This function replaces special characters in string using the %xx escape. Letters, digits, and the characters '_.-~' are never quoted.
>>> from urllib.parse import quote >>> q = quote(url) 'https%3A//mail.google.com/mail/u/0/%3Ftab%3Drm%23inbox' quote_plus():
Like quote(), but also replace spaces by plus signs, as required for quoting HTML form values when building up a query string to go into a URL.
unquote()
This function replaces %xx escapes by their single-character equivalent.
>>> from urllib.parse import unquote >>> unquote(q) 'https://mail.google.com/mail/u/0/?tab = rm#inbox'
urlencode()
This function converts a mapping object or a sequence of two-element tuples,to a percent-encoded ASCII text string. The resulting string is a series of key = value pairs separated by '&' characters.
>>> from urllib.parse import urlencode >>> qry = {"name":"Rajeev", "salary":20000} >>> urlencode(qry) 'name = Rajeev&salary = 20000'