0% found this document useful (0 votes)
392 views

How Do I Download A File Over HTTP Using Python - Stack Overflow

The document discusses several ways to download files over HTTP using Python, including: 1. Using urllib2 to download files in Python 2 by opening a URL and reading the response. 2. Using urllib.urlretrieve to download files, specifying the URL and local filename. 3. Using the requests library to download files in Python 3 in a simpler way than urllib/urllib2, and showing how to check the download size. 4. Using requests along with tqdm to add a progress bar for downloading files.

Uploaded by

fakkelogin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
392 views

How Do I Download A File Over HTTP Using Python - Stack Overflow

The document discusses several ways to download files over HTTP using Python, including: 1. Using urllib2 to download files in Python 2 by opening a URL and reading the response. 2. Using urllib.urlretrieve to download files, specifying the URL and local filename. 3. Using the requests library to download files in Python 3 in a simpler way than urllib/urllib2, and showing how to check the download size. 4. Using requests along with tqdm to add a progress bar for downloading files.

Uploaded by

fakkelogin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

11/10/2017 How do I download a file over HTTP using Python?

- Stack Overflow

Learn, Share, Build


Each month, over 50 million developers come to Stack Overflow to Google Facebook
learn, share their knowledge, and build their careers. OR

Join the worlds largest developer community.

How do I download a file over HTTP using Python?

I have a small utility that I use to download a MP3 from a website on a schedule and then builds/updates a podcast XML file which I've
obviously added to iTunes.

The text processing that creates/updates the XML file is written in Python. I use wget inside a Windows .bat file to download the actual MP3
however. I would prefer to have the entire utility written in Python though.

I struggled though to find a way to actually down load the file in Python, thus why I resorted to wget .

So, how do I download the file using Python?

python http urllib

edited Mar 31 at 5:11 asked Aug 22 '08 at 15:34


kilojoules Owen
1,768 3 23 55 9,447 12 34 46

See also: How to save an image locally using Python whose URL address I already know? Martin Thoma
Mar 14 '16 at 11:24

Many of the answers below are not a satisfactory replacement for wget . Among other things, wget (1)
preserves timestamps (2) auto-determines filename from url, appending .1 (etc.) if the file already exists
(3) has many other options, some of which you may have put in your .wgetrc . If you want any of those, you
have to implement them yourself in Python, but it's simpler to just invoke wget from Python. ShreevatsaR
Sep 27 '16 at 17:22

18 Answers

In Python 2, use urllib2 which comes with the standard library.

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

This is the most basic way to use the library, minus any error handling. You can also do more
complex stuff such as changing headers. The documentation can be found here.

edited Jun 5 '14 at 11:06 answered Aug 22 '08 at 15:38


Deep LF Corey
154 11 9,149 7 30 33

10 This won't work if there are spaces in the url you provide. In that case, you'll need to parse the url and
urlencode the path. Jason Sundram Apr 14 '10 at 21:17

47 Here is the Python 3 solution: stackoverflow.com/questions/7243750/ tommy.carstensen Feb 25 '14 at


12:09

5 Just for reference. The way to urlencode the path is urllib2.quote Andr Puel Aug 2 '14 at 2:09

8 @JasonSundram: If there are spaces in it, it isn't a URI. Zaz Oct 1 '15 at 2:51

This does not work on windows with larger files. You need to read all blocks! Avia Oct 10 '16 at 22:47

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 1/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow

One more, using urlretrieve :

import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

(for Python 3+ use 'import urllib.request' and urllib.request.urlretrieve)

Yet another one, with a "progressbar"

import urllib2

url = "http://download.thinkbroadband.com/10MB.zip"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break

file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,

f.close()

edited Aug 19 '16 at 12:34 answered Aug 22 '08 at 16:19


Paamand PabloG
141 10 16.9k 10 33 50

1 Oddly enough, this worked for me on Windows when the urllib2 method wouldn't. The urllib2 method worked
on Mac, though. InFreefall May 15 '11 at 21:49

4 Bug: file_size_dl += block_sz should be += len(buffer) since the last read is often not a full block_sz. Also on
windows you need to open the output file as "wb" if it isn't a text file. Eggplant Jeff May 25 '11 at 17:53

1 Me too urllib and urllib2 didn't work but urlretrieve worked well, was getting frustrated - thanks :) funk-shun
Jul 12 '11 at 6:08

2 Wrap the whole thing (except the definition of file_name) with if not os.path.isfile(file_name): to
avoid overwriting podcasts! useful when running it as a cronjob with the urls found in a .html file
Sriram Murali May 1 '12 at 20:15

2 @PabloG it's a tiny bit more than 31 votes now ;) Anyway, status bar was fun so i'll +1 Cinder Mar 28 '13
at 17:35

In 2012, use the python requests library

>>> import requests


>>>
>>> url = "http://download.thinkbroadband.com/10MB.zip"
>>> r = requests.get(url)
>>> print len(r.content)
10485760

You can run pip install requests to get it.

Requests has many advantages over the alternatives because the API is much simpler. This is
especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and
painful in this case.

2015-12-30

People have expressed admiration for the progress bar. It's cool, sure. There are several off-
the-shelf solutions now, including tqdm :

from tqdm import tqdm


import requests

url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)

with open("10MB", "wb") as handle:


for data in tqdm(response.iter_content()):
handle.write(data)

This is essentially the implementation @kvance described 30 months ago.

edited Dec 31 '15 at 16:45 answered May 24 '12 at 20:08


hughdbrown
26.5k 15 67 92

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 2/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow

3 How does this handle large files, does everything get stored into memory or can this be written to a file
without large memory requirement? bibstha Dec 17 '12 at 16:05

7 It is possible to stream large files by setting stream=True in the request. You can then call iter_content() on
the response to read a chunk at a time. kvance Jul 28 '13 at 17:14

6 Why would a url library need to have a file unzip facility? Read the file from the url, save it and then unzip it
in whatever way floats your boat. Also a zip file is not a 'folder' like it shows in windows, Its a file. Harel
Nov 15 '13 at 16:36

1 @Ali: r.text : For text or unicode content. Returned as unicode. r.content : For binary content.
Returned as bytes. Read about it here: docs.python-requests.org/en/latest/user/quickstart hughdbrown
Jan 17 '16 at 18:44

1 +1 for requests. best lib out there and 1000x better than the native urllib2 brianSan Mar 11 '16 at 16:45

import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
output.write(mp3file.read())

The wb in open('test.mp3','wb') opens a file (and erases any existing file) in binary mode so
you can save data with it instead of just text.

edited Mar 10 '16 at 17:14 answered Aug 22 '08 at 15:58


Matthew Strawbridge Grant
13.3k 9 47 76 7,828 11 34 46

23 The disadvantage of this solution is, that the entire file is loaded into ram before saved to disk, just
something to keep in mind if using this for large files on a small system like a router with limited ram.
tripplet Nov 18 '12 at 13:33

1 @tripplet so how would we fix that? Lucas Henrique Jul 30 '15 at 15:10

9 To avoid reading the whole file into memory, try passing an argument to file.read that is the number of
bytes to read. See: gist.github.com/hughdbrown/c145b8385a2afa6570e2 hughdbrown Oct 7 '15 at 16:02

@hughdbrown I found your script useful, but have one question: can I use the file for post-processing?
suppose I download a jpg file that I want to process with OpenCV, can I use the 'data' variable to keep
working? or do I have to read it again from the downloaded file? Rodrigo E. Principe Nov 16 '16 at 12:29

Use shutil.copyfileobj(mp3file, output) instead. Aurlien Ooms Nov 6 at 14:20

Here's how to do it in Python 3 using the standard library:

urllib.request.urlopen

import urllib.request
response = urllib.request.urlopen('http://www.example.com/')
html = response.read()

urllib.request.urlretrieve

import urllib.request
urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')

edited Jul 11 at 12:09 answered Aug 6 '15 at 13:30


bmaupin
4,550 2 38 43

It sure took a while, but there, finally is the easy straightforward api I expect from a python stdlib :)
ThorSummoner Aug 4 at 20:52

Exactly what I was looking for, 1 liner (or 2) Programmer Sep 29 at 17:45

An improved version of the PabloG code for Python 2/3:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import ( division, absolute_import, print_function, unicode_literals )

import sys, os, tempfile, logging

if sys.version_info >= (3,):


import urllib.request as urllib2
import urllib.parse as urlparse
else:
import urllib2
import urlparse

def download_file(url, dest=None):


"""
Download and save a file specified by url to dest directory,
"""
u = urllib2.urlopen(url)

scheme, netloc, path, query, fragment = urlparse.urlsplit(url)

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 3/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
filename = os.path.basename(path)
if not filename:
filename = 'downloaded.file'
if dest:
filename = os.path.join(dest, filename)

with open(filename, 'wb') as f:


meta = u.info()
meta_func = meta.getheaders if hasattr(meta, 'getheaders') else meta.get_all
meta_length = meta_func("Content-Length")
file_size = None
if meta_length:
file_size = int(meta_length[0])
print("Downloading: {0} Bytes: {1}".format(url, file_size))

file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break

file_size_dl += len(buffer)
f.write(buffer)

status = "{0:16}".format(file_size_dl)
if file_size:
status += " [{0:6.2f}%]".format(file_size_dl * 100 / file_size)
status += chr(13)
print(status, end="")
print()

return filename

if __name__ == "__main__": # Only run if this file is called directly


print("Testing with 10MB download")
url = "http://download.thinkbroadband.com/10MB.zip"
filename = download_file(url)
print(filename)

edited Aug 6 at 6:32 answered May 13 '13 at 8:59


Steve Barnes Stan
18k 4 29 44 292 2 6

I would remove the parentheses from the first line, because it is not too old feature. Arpad Horvath May 30
'13 at 19:37

Wrote wget library in pure Python just for this purpose. It is pumped up urlretrieve with these
features as of version 2.0.

answered Sep 25 '13 at 17:55


anatoly techtonik
9,442 3 72 81

3 No option to save with custom filename ? Alex May 21 '14 at 15:29

2 @Alex added -o FILENAME option to version 2.1 anatoly techtonik Jul 10 '14 at 11:04

The progress bar does not appear when I use this module under Cygwin. Joe Coder May 6 '15 at 7:40

You should change from -o to -O to avoid confusion, as it is in GNU wget. Or at least both options should
be valid. erik Jul 17 '15 at 15:46

@eric I am not sure that I want to make wget.py an in-place replacement for real wget . The -o already
behaves differently - it is compatible with curl this way. Would a note in documentation help to resolve the
issue? Or it is the essential feature for an utility with such name to be command line compatible?
anatoly techtonik Jul 17 '15 at 20:24

use wget module:

import wget
wget.download('url')

answered Mar 25 '15 at 12:59


Sara Santana
489 5 15

I agree with Corey, urllib2 is more complete than urllib and should likely be the module used if
you want to do more complex things, but to make the answers more complete, urllib is a
simpler module if you want just the basics:

import urllib
response = urllib.urlopen('http://www.example.com/sound.mp3')
mp3 = response.read()

Will work fine. Or, if you don't want to deal with the "response" object you can call read()
directly:

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 4/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
import urllib
mp3 = urllib.urlopen('http://www.example.com/sound.mp3').read()

answered Aug 22 '08 at 15:58


akdom
10.8k 20 63 75

Following are the most commonly used calls for downloading files in python:

1. urllib.urlretrieve ('url_to_file', file_name)

2. urllib2.urlopen('url_to_file')

3. requests.get(url)

4. wget.download('url', file_name)

Note: urlopen and urlretrieve are found to perform relatively bad with downloading large
files (size > 500 MB). requests.get stores the file in-memory until download is complete.

answered Sep 19 '16 at 12:45


Jaydev
814 7 23

You can get the progress feedback with urlretrieve as well:

def report(blocknr, blocksize, size):


current = blocknr*blocksize
sys.stdout.write("\r{0:.2f}%".format(100.0*current/size))

def downloadFile(url):
print "\n",url
fname = url.split('/')[-1]
print fname
urllib.urlretrieve(url, fname, report)

answered Jan 26 '14 at 13:12


Marcin Cuprjak
329 2 3

If you have wget installed, you can use parallel_sync.

pip install parallel_sync

from parallel_sync import wget


urls = ['http://something.png', 'http://somthing.tar.gz', 'http://somthing.zip']
wget.download('/tmp', urls)
# or a single file:
wget.download('/tmp', urls[0], filenames='x.zip', extract=True)

Doc: https://pythonhosted.org/parallel_sync/pages/examples.html

This is pretty powerful. It can download files in parallel, retry upon failure , and it can even
download files on a remote machine.

answered Nov 19 '15 at 23:48


max
3,087 4 34 66

Note this is for Linux only jjj Sep 7 at 18:18

Simple yet Python 2 & Python 3 compatible way:

from six.moves import urllib


urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

edited Jul 2 at 5:24 answered Jun 22 at 7:59


Akif
364 2 10

Source code can be:

import urllib
sock = urllib.urlopen("http://diveintopython.org/")
htmlSource = sock.read()
sock.close()
print htmlSource

edited Nov 26 '13 at 14:25 answered Nov 26 '13 at 13:21

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 5/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
Mailerdaimon Olu
3,917 1 18 34 1,109 13 17

This may be a little late, But I saw pabloG's code and couldn't help adding a os.system('cls') to
make it look AWESOME! Check it out :

import urllib2,os

url = "http://download.thinkbroadband.com/10MB.zip"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
os.system('cls')
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break

file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,

f.close()

If running in an environment other than Windows, you will have to use something other then
'cls'. In MAC OS X and Linux it should be 'clear'.

edited May 16 at 16:46 answered Oct 14 '13 at 2:54


JD3
294 4 12

3 cls doesn't do anything on my OS X or nor on an Ubuntu server of mine. Some clarification could be
good. Kasper Souren Sep 24 '14 at 21:57

I think you should use clear for linux, or even better replace the print line instead of clearing the whole
command line output. Arijoon Jan 21 '15 at 1:01

3 this answer just copies another answer and adds a call to a deprecated function ( os.system() ) that
launches a subprocess to clear the screen using a platform specific command ( cls ). How does this have
any upvotes?? Utterly worthless "answer" IMHO. Corey Goldberg Dec 11 '15 at 19:56

urlretrieve and requests.get is simple, however the reality not. I have fetched data for couple
sites, including text and images, the above two probably solve most of the tasks. but for a
more universal solution I suggest the use of urlopen. As it is included in Python 3 standard
library, your code could run on any machine that run Python 3 without pre-installing site-par

import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)
len_content = url_content.length

#remember to open file in bytes mode


with open(filename, 'wb') as f:
while True:
buffer = url_connect.read(buffer_size)
if not buffer: break

#an integer value of size of written data


data_wrote = f.write(buffer)

#you could probably use with-open-as manner


url_connect.close()

This answer provides a solution to HTTP 403 Forbidden when downloading file over http using
Python. I have tried only requests and urllib modules, the other module may provide something
better, but this is the one I used to solve most of the problems.

answered Mar 13 at 13:12


Sphynx-HenryAY
337 2 6

I wrote the following, which works in vanilla Python 2 or Python 3.

import sys
try:
import urllib.request
python3 = True
except ImportError:
import urllib2

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 6/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
python3 = False

def progress_callback_simple(downloaded,total):
sys.stdout.write(
"\r" +
(len(str(total))-len(str(downloaded)))*" " + str(downloaded) + "/%d"%total +
" [%3.2f%%]"%(100.0*float(downloaded)/float(total))
)
sys.stdout.flush()

def download(srcurl, dstfilepath, progress_callback=None, block_size=8192):


def _download_helper(response, out_file, file_size):
if progress_callback!=None: progress_callback(0,file_size)
if block_size == None:
buffer = response.read()
out_file.write(buffer)

if progress_callback!=None: progress_callback(file_size,file_size)
else:
file_size_dl = 0
while True:
buffer = response.read(block_size)
if not buffer: break

file_size_dl += len(buffer)
out_file.write(buffer)

if progress_callback!=None: progress_callback(file_size_dl,file_size)
with open(dstfilepath,"wb") as out_file:
if python3:
with urllib.request.urlopen(srcurl) as response:
file_size = int(response.getheader("Content-Length"))
_download_helper(response,out_file,file_size)
else:
response = urllib2.urlopen(srcurl)
meta = response.info()
file_size = int(meta.getheaders("Content-Length")[0])
_download_helper(response,out_file,file_size)

import traceback
try:
download(

"https://geometrian.com/data/programming/projects/glLib/glLib%20Reloaded%200.5.9/0.5.9.zip",

"output.zip",
progress_callback_simple
)
except:
traceback.print_exc()
input()

Notes:

Supports a "progress bar" callback.


Download is a 4 MB test .zip from my website.

edited May 13 at 21:52 answered May 13 at 21:33


imallett
7,410 6 30 78

If speed matters to you, I made a small performance test for the modules urllib and wget ,
and regarding wget I tried once with status bar and once without. I took three different 500MB
files to test with (different files- to eliminate the chance that there is some caching going on
under the hood). Tested on debian machine, with python2.

First, these are the results (they are similar in different runs):

$ python wget_test.py
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 /
541335552
wget_with_bar_test : 50.49
==============

The way I performed the test is using "profile" decorator. This is the full code:

import wget
import urllib
import time
from functools import wraps

def profile(func):
@wraps(func)
def inner(*args):
print func.__name__, ": starting"
start = time.time()

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 7/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
ret = func(*args)
end = time.time()
print func.__name__, ": {:.2f}".format(end - start)
return ret
return inner

url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'

def do_nothing(*args):
pass

@profile
def urlretrive_test(url):
return urllib.urlretrieve(url)

@profile
def wget_no_bar_test(url):
return wget.download(url, out='/tmp/', bar=do_nothing)

@profile
def wget_with_bar_test(url):
return wget.download(url, out='/tmp/')

urlretrive_test(url1)
print '=============='
time.sleep(1)

wget_no_bar_test(url2)
print '=============='
time.sleep(1)

wget_with_bar_test(url3)
print '=============='
time.sleep(1)

urllib seems to be the fastest

answered Nov 3 at 14:25


Omer Dagan
3,692 5 21 41

protected by Community Sep 4 '14 at 4:42


Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on
this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 8/8

You might also like