How Do I Download A File Over HTTP Using Python - Stack Overflow
How Do I Download A File Over HTTP Using Python - Stack Overflow
- Stack Overflow
I have a small utility that I use to download a MP3 from a website on a schedule and then builds/updates a podcast XML file which I've
obviously added to iTunes.
The text processing that creates/updates the XML file is written in Python. I use wget inside a Windows .bat file to download the actual MP3
however. I would prefer to have the entire utility written in Python though.
I struggled though to find a way to actually down load the file in Python, thus why I resorted to wget .
See also: How to save an image locally using Python whose URL address I already know? Martin Thoma
Mar 14 '16 at 11:24
Many of the answers below are not a satisfactory replacement for wget . Among other things, wget (1)
preserves timestamps (2) auto-determines filename from url, appending .1 (etc.) if the file already exists
(3) has many other options, some of which you may have put in your .wgetrc . If you want any of those, you
have to implement them yourself in Python, but it's simpler to just invoke wget from Python. ShreevatsaR
Sep 27 '16 at 17:22
18 Answers
import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
This is the most basic way to use the library, minus any error handling. You can also do more
complex stuff such as changing headers. The documentation can be found here.
10 This won't work if there are spaces in the url you provide. In that case, you'll need to parse the url and
urlencode the path. Jason Sundram Apr 14 '10 at 21:17
5 Just for reference. The way to urlencode the path is urllib2.quote Andr Puel Aug 2 '14 at 2:09
8 @JasonSundram: If there are spaces in it, it isn't a URI. Zaz Oct 1 '15 at 2:51
This does not work on windows with larger files. You need to read all blocks! Avia Oct 10 '16 at 22:47
https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 1/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
import urllib
urllib.urlretrieve ("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
import urllib2
url = "http://download.thinkbroadband.com/10MB.zip"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
1 Oddly enough, this worked for me on Windows when the urllib2 method wouldn't. The urllib2 method worked
on Mac, though. InFreefall May 15 '11 at 21:49
4 Bug: file_size_dl += block_sz should be += len(buffer) since the last read is often not a full block_sz. Also on
windows you need to open the output file as "wb" if it isn't a text file. Eggplant Jeff May 25 '11 at 17:53
1 Me too urllib and urllib2 didn't work but urlretrieve worked well, was getting frustrated - thanks :) funk-shun
Jul 12 '11 at 6:08
2 Wrap the whole thing (except the definition of file_name) with if not os.path.isfile(file_name): to
avoid overwriting podcasts! useful when running it as a cronjob with the urls found in a .html file
Sriram Murali May 1 '12 at 20:15
2 @PabloG it's a tiny bit more than 31 votes now ;) Anyway, status bar was fun so i'll +1 Cinder Mar 28 '13
at 17:35
Requests has many advantages over the alternatives because the API is much simpler. This is
especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and
painful in this case.
2015-12-30
People have expressed admiration for the progress bar. It's cool, sure. There are several off-
the-shelf solutions now, including tqdm :
url = "http://download.thinkbroadband.com/10MB.zip"
response = requests.get(url, stream=True)
https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 2/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
3 How does this handle large files, does everything get stored into memory or can this be written to a file
without large memory requirement? bibstha Dec 17 '12 at 16:05
7 It is possible to stream large files by setting stream=True in the request. You can then call iter_content() on
the response to read a chunk at a time. kvance Jul 28 '13 at 17:14
6 Why would a url library need to have a file unzip facility? Read the file from the url, save it and then unzip it
in whatever way floats your boat. Also a zip file is not a 'folder' like it shows in windows, Its a file. Harel
Nov 15 '13 at 16:36
1 @Ali: r.text : For text or unicode content. Returned as unicode. r.content : For binary content.
Returned as bytes. Read about it here: docs.python-requests.org/en/latest/user/quickstart hughdbrown
Jan 17 '16 at 18:44
1 +1 for requests. best lib out there and 1000x better than the native urllib2 brianSan Mar 11 '16 at 16:45
import urllib2
mp3file = urllib2.urlopen("http://www.example.com/songs/mp3.mp3")
with open('test.mp3','wb') as output:
output.write(mp3file.read())
The wb in open('test.mp3','wb') opens a file (and erases any existing file) in binary mode so
you can save data with it instead of just text.
23 The disadvantage of this solution is, that the entire file is loaded into ram before saved to disk, just
something to keep in mind if using this for large files on a small system like a router with limited ram.
tripplet Nov 18 '12 at 13:33
1 @tripplet so how would we fix that? Lucas Henrique Jul 30 '15 at 15:10
9 To avoid reading the whole file into memory, try passing an argument to file.read that is the number of
bytes to read. See: gist.github.com/hughdbrown/c145b8385a2afa6570e2 hughdbrown Oct 7 '15 at 16:02
@hughdbrown I found your script useful, but have one question: can I use the file for post-processing?
suppose I download a jpg file that I want to process with OpenCV, can I use the 'data' variable to keep
working? or do I have to read it again from the downloaded file? Rodrigo E. Principe Nov 16 '16 at 12:29
urllib.request.urlopen
import urllib.request
response = urllib.request.urlopen('http://www.example.com/')
html = response.read()
urllib.request.urlretrieve
import urllib.request
urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
It sure took a while, but there, finally is the easy straightforward api I expect from a python stdlib :)
ThorSummoner Aug 4 at 20:52
Exactly what I was looking for, 1 liner (or 2) Programmer Sep 29 at 17:45
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import ( division, absolute_import, print_function, unicode_literals )
https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 3/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
filename = os.path.basename(path)
if not filename:
filename = 'downloaded.file'
if dest:
filename = os.path.join(dest, filename)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = "{0:16}".format(file_size_dl)
if file_size:
status += " [{0:6.2f}%]".format(file_size_dl * 100 / file_size)
status += chr(13)
print(status, end="")
print()
return filename
I would remove the parentheses from the first line, because it is not too old feature. Arpad Horvath May 30
'13 at 19:37
Wrote wget library in pure Python just for this purpose. It is pumped up urlretrieve with these
features as of version 2.0.
2 @Alex added -o FILENAME option to version 2.1 anatoly techtonik Jul 10 '14 at 11:04
The progress bar does not appear when I use this module under Cygwin. Joe Coder May 6 '15 at 7:40
You should change from -o to -O to avoid confusion, as it is in GNU wget. Or at least both options should
be valid. erik Jul 17 '15 at 15:46
@eric I am not sure that I want to make wget.py an in-place replacement for real wget . The -o already
behaves differently - it is compatible with curl this way. Would a note in documentation help to resolve the
issue? Or it is the essential feature for an utility with such name to be command line compatible?
anatoly techtonik Jul 17 '15 at 20:24
import wget
wget.download('url')
I agree with Corey, urllib2 is more complete than urllib and should likely be the module used if
you want to do more complex things, but to make the answers more complete, urllib is a
simpler module if you want just the basics:
import urllib
response = urllib.urlopen('http://www.example.com/sound.mp3')
mp3 = response.read()
Will work fine. Or, if you don't want to deal with the "response" object you can call read()
directly:
https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 4/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
import urllib
mp3 = urllib.urlopen('http://www.example.com/sound.mp3').read()
Following are the most commonly used calls for downloading files in python:
2. urllib2.urlopen('url_to_file')
3. requests.get(url)
4. wget.download('url', file_name)
Note: urlopen and urlretrieve are found to perform relatively bad with downloading large
files (size > 500 MB). requests.get stores the file in-memory until download is complete.
def downloadFile(url):
print "\n",url
fname = url.split('/')[-1]
print fname
urllib.urlretrieve(url, fname, report)
Doc: https://pythonhosted.org/parallel_sync/pages/examples.html
This is pretty powerful. It can download files in parallel, retry upon failure , and it can even
download files on a remote machine.
import urllib
sock = urllib.urlopen("http://diveintopython.org/")
htmlSource = sock.read()
sock.close()
print htmlSource
https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 5/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
Mailerdaimon Olu
3,917 1 18 34 1,109 13 17
This may be a little late, But I saw pabloG's code and couldn't help adding a os.system('cls') to
make it look AWESOME! Check it out :
import urllib2,os
url = "http://download.thinkbroadband.com/10MB.zip"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
os.system('cls')
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
If running in an environment other than Windows, you will have to use something other then
'cls'. In MAC OS X and Linux it should be 'clear'.
3 cls doesn't do anything on my OS X or nor on an Ubuntu server of mine. Some clarification could be
good. Kasper Souren Sep 24 '14 at 21:57
I think you should use clear for linux, or even better replace the print line instead of clearing the whole
command line output. Arijoon Jan 21 '15 at 1:01
3 this answer just copies another answer and adds a call to a deprecated function ( os.system() ) that
launches a subprocess to clear the screen using a platform specific command ( cls ). How does this have
any upvotes?? Utterly worthless "answer" IMHO. Corey Goldberg Dec 11 '15 at 19:56
urlretrieve and requests.get is simple, however the reality not. I have fetched data for couple
sites, including text and images, the above two probably solve most of the tasks. but for a
more universal solution I suggest the use of urlopen. As it is included in Python 3 standard
library, your code could run on any machine that run Python 3 without pre-installing site-par
import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)
len_content = url_content.length
This answer provides a solution to HTTP 403 Forbidden when downloading file over http using
Python. I have tried only requests and urllib modules, the other module may provide something
better, but this is the one I used to solve most of the problems.
import sys
try:
import urllib.request
python3 = True
except ImportError:
import urllib2
https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 6/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
python3 = False
def progress_callback_simple(downloaded,total):
sys.stdout.write(
"\r" +
(len(str(total))-len(str(downloaded)))*" " + str(downloaded) + "/%d"%total +
" [%3.2f%%]"%(100.0*float(downloaded)/float(total))
)
sys.stdout.flush()
if progress_callback!=None: progress_callback(file_size,file_size)
else:
file_size_dl = 0
while True:
buffer = response.read(block_size)
if not buffer: break
file_size_dl += len(buffer)
out_file.write(buffer)
if progress_callback!=None: progress_callback(file_size_dl,file_size)
with open(dstfilepath,"wb") as out_file:
if python3:
with urllib.request.urlopen(srcurl) as response:
file_size = int(response.getheader("Content-Length"))
_download_helper(response,out_file,file_size)
else:
response = urllib2.urlopen(srcurl)
meta = response.info()
file_size = int(meta.getheaders("Content-Length")[0])
_download_helper(response,out_file,file_size)
import traceback
try:
download(
"https://geometrian.com/data/programming/projects/glLib/glLib%20Reloaded%200.5.9/0.5.9.zip",
"output.zip",
progress_callback_simple
)
except:
traceback.print_exc()
input()
Notes:
If speed matters to you, I made a small performance test for the modules urllib and wget ,
and regarding wget I tried once with status bar and once without. I took three different 500MB
files to test with (different files- to eliminate the chance that there is some caching going on
under the hood). Tested on debian machine, with python2.
First, these are the results (they are similar in different runs):
$ python wget_test.py
urlretrive_test : starting
urlretrive_test : 6.56
==============
wget_no_bar_test : starting
wget_no_bar_test : 7.20
==============
wget_with_bar_test : starting
100% [......................................................................] 541335552 /
541335552
wget_with_bar_test : 50.49
==============
The way I performed the test is using "profile" decorator. This is the full code:
import wget
import urllib
import time
from functools import wraps
def profile(func):
@wraps(func)
def inner(*args):
print func.__name__, ": starting"
start = time.time()
https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 7/8
11/10/2017 How do I download a file over HTTP using Python? - Stack Overflow
ret = func(*args)
end = time.time()
print func.__name__, ": {:.2f}".format(end - start)
return ret
return inner
url1 = 'http://host.com/500a.iso'
url2 = 'http://host.com/500b.iso'
url3 = 'http://host.com/500c.iso'
def do_nothing(*args):
pass
@profile
def urlretrive_test(url):
return urllib.urlretrieve(url)
@profile
def wget_no_bar_test(url):
return wget.download(url, out='/tmp/', bar=do_nothing)
@profile
def wget_with_bar_test(url):
return wget.download(url, out='/tmp/')
urlretrive_test(url1)
print '=============='
time.sleep(1)
wget_no_bar_test(url2)
print '=============='
time.sleep(1)
wget_with_bar_test(url3)
print '=============='
time.sleep(1)
https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python 8/8