Remove URLs from string in Python

Last Updated : 5 Dec, 2025

Given a string that contains one or more URLs, the goal is to remove those URLs and replace them with clean text. For Example:

Input: Visit our site at https://www.geeksforgeeks.org/ for tutorials.
Output: Visit our site at [URL REMOVED] for tutorials.
Explanation: URL is detected inside the text and replaced with the placeholder [URL REMOVED]

Let's explore different ways to remove URLs from a string in Python.

Using re.sub()

This method replaces all URLs in one go. re.sub() scans the entire string for matches of the regex pattern and directly substitutes them with the replacement text.

Python
import re

txt = "Visit: https://www.geeksforgeeks.org/ for more info."
p = r'https?://\S+|www\.\S+'

res = re.sub(p, "[URL REMOVED]", txt)
print(res)

Output
Visit: [URL REMOVED] for more info.

Explanation:

  • re.sub(p, "[URL REMOVED]", txt) finds every URL matching pattern p and replaces it.
  • Pattern https?://\S+|www\.\S+ matches links starting with http, https, or www.

Using re.findall()

This method first extracts all URLs using re.findall(), then replaces each one manually within the text. Useful when you need the list of URLs as well.

Python
import re

txt = "Python tutorials: https://www.geeksforgeeks.org/category/python/"
p = r'https?://\S+|www\.\S+'

urls = re.findall(p, txt)
for u in urls:
    txt = txt.replace(u, "[URL REMOVED]")

print(txt)

Output
Python tutorials: [URL REMOVED]

Explanation:

  • re.findall(p, txt) returns all URL matches as a list.
  • The loop replaces each found URL using txt.replace(u, "[URL REMOVED]").

Using re.search()

This method removes one URL at a time using repeated searching using re.search(). Useful when you need full control over how each match is replaced.

Python
import re

txt = "Visit https://www.geeksforgeeks.org/ for info."
p = re.compile(r'https?://\S+|www\.\S+')

while True:
    m = p.search(txt)
    if not m:
        break
    txt = txt[:m.start()] + "[URL REMOVED]" + txt[m.end():]

print(txt)

Output
Visit [URL REMOVED] for info.

Explanation:

  • p.search(txt) finds the next URL in the string.
  • m.start() and m.end() give the exact position of the URL.
  • The replacement is done manually by slicing around the match: txt[:m.start()] text before the URL, " [URL REMOVED] " replacement and txt[m.end():] text after the URL
  • Loop continues until no URLs remain.

Using urllib.parse

This method checks each word to see if it behaves like a URL and replaces it using urllib module. It works without regex by analyzing the structure of each word.

Python
from urllib.parse import urlparse

txt = "Check https://www.geeksforgeeks.org/ for resources."
words = txt.split()

for i, w in enumerate(words):
    p = urlparse(w)
    if p.scheme and p.netloc:
        words[i] = "[URL REMOVED]"

res = " ".join(words)
print(res)

Output
Check [URL REMOVED] for resources.

Explanation:

  • urlparse(w) breaks the word into parts like scheme and domain.
  • If both p.scheme and p.netloc exist, the word is considered a URL.
  • The word is then replaced in the list and recombined into a string.
Comment

Explore