
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Clone Webpage Using PyWebCopy in Python
Python provides Pywebcopy module, that allows us to download and store the entire website including all the images, HTML pages and other files to our machine. In this module, we have one of the functions namely save_webpage() which allows us to clone the webpage.
Installing pywebcopy module
Firstly, we have to install the pywebcopy module in the python environment using the following code.
pip install pywebcopy
On successful installation we will get the following output -
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting pywebcopy Downloading pywebcopy-7.0.2-py2.py3-none-any.whl (46 kB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Installing collected packages: pywebcopy Successfully installed pywebcopy-7.0.2
Syntax
Following is the syntax for using Pywebcopy module save_webpage() function.
from pywebpage import save_webpage kwargs = {?bypass_robots': True, ?project_name':'example'} save_webpage(url,folder,**kwargs)
Where,
kwargs are the optional keyword arguments that we can use while downloading the webpage
bypass_robots is the keyword which allows the robot.txt files to download along with the webpage
project_name is the name of the downloaded webpage
save_webpage is the function
URL is the link of the webpage.
Folder is the location where we save the downloaded file.
Example
Following is an example where we will specify the webpage URL, location for storing the file and additional keyword arguments to the save_webpage() function of pywebcopy module, then the defined webpage will be saved in the defined location with the specified name.
from pywebcopy import save_webpage url = 'https://www.tutorialspoint.com/' folder = 'Desktop/March 2023' kwargs = {'bypass_robots': True, 'project_name': 'sample_webpage'} save_webpage(url, folder, **kwargs) print("webpage saved in the location:",folder)
Output
When we run the above code, following output will be generated -
webpage saved in the location: Desktop/March 2023
Example
Let's see another example for this -
from pywebcopy import save_webpage url = 'https://www.python.org/' folder = 'Articles/March 2023' kwargs = {'bypass_robots': False, 'project_name': 'webpage'} save_webpage(url, folder, **kwargs) print("webpage saved in the location:",folder)
Output
Following is the output of saving the webpage.
webpage saved in the location: Articles/March 2023