Parse URLs into Components in Python



This module provides a standard interface to break Uniform Resource Locator (URL) strings in components or to combine the components back into a URL string. It also has functions to convert a "relative URL" to an absolute URL given a "base URL."

This module supports the following URL schemes -

  • file
  • ftp
  • gopher
  • hdl
  • http
  • https
  • imap
  • mailto
  • mms
  • news
  • nntp
  • prospero
  • rsync
  • rtsp
  • rtspu
  • sftp
  • shttp
  • sip
  • sips
  • snews
  • svn
  • svn+ssh
  • telnet
  • wais
  • ws
  • wss

urlparse()

This function parses a URL into six components, returning a 6-tuple. This corresponds to the general structure of a URL. Each tuple item is a string. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The return value is an instance of a subclass of tuple made up of following attributes:

Attribute Index Value Value if not present
scheme 0 URL scheme specifier scheme parameter
netloc 1 Network location part scheme parameter
path 2 Hierarchical path empty string
params 3 Parameters for last path element empty string
query 4 Query component empty string
fragment 5 Fragment identifier empty string
username
User name None
password
Password None
hostname
Host name (lower case) None
port
Port number as integer, if present None

Example

>>> from urllib.parse import urlparse
>>> url = 'https://mail.google.com/mail/u/0/?tab = rm#inbox'
>>> t = urlparse(url)
ParseResult(scheme = 'https', netloc = 'mail.google.com', path = '/mail/u/0/', params = '', query = 'tab = rm', fragment = 'inbox')

urlunparse(parts)

This function constructs a URL from a tuple as returned by urlparse(). The parts argument can be any six-item iterable.

>>> from urllib.parse import urlunparse
>>> urlunparse(t)
'https://mail.google.com/mail/u/0/?tab = rm#inbox'

urlsplit(urlstring, scheme = '', allow_fragments = True):

This is similar to urlparse(), but does not split the params from the URL. This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier).

>>> from urllib.parse import urlsplit
>>> urlsplit(url)
SplitResult(scheme = 'https', netloc = 'mail.google.com', path = '/mail/u/0/', query = 'tab = rm', fragment = 'inbox')

urlunsplit(parts)

This function combines the elements of a tuple as returned by urlsplit() into a complete URL as a string.

The URL quoting functions focus on taking program data and making it safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text.

quote()

This function replaces special characters in string using the %xx escape. Letters, digits, and the characters '_.-~' are never quoted.

>>> from urllib.parse import quote
>>> q = quote(url)
'https%3A//mail.google.com/mail/u/0/%3Ftab%3Drm%23inbox'
quote_plus():

Like quote(), but also replace spaces by plus signs, as required for quoting HTML form values when building up a query string to go into a URL.

unquote()

This function replaces %xx escapes by their single-character equivalent.

>>> from urllib.parse import unquote
>>> unquote(q)
'https://mail.google.com/mail/u/0/?tab = rm#inbox'

urlencode()

This function converts a mapping object or a sequence of two-element tuples,to a percent-encoded ASCII text string. The resulting string is a series of key = value pairs separated by '&' characters.

>>> from urllib.parse import urlencode
>>> qry = {"name":"Rajeev", "salary":20000}
>>> urlencode(qry)
'name = Rajeev&salary = 20000'
Updated on: 2019-07-30T22:30:25+05:30

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements