The Largest Proxy & Web Data Collection Network: NetNut | SourceForge Podcast, episode #55

By Community Team

NetNut’s vast network of over 85 million rotating residential proxies delivers unmatched speed and anonymity, effortlessly transforming any website into structured data for seamless web scraping. With zero IP blocks, global coverage, and a user-friendly dashboard, NetNut empowers businesses to scale data collection while ensuring cost-effective, reliable performance.

Try the Google SERP Scraper API today!

In this episode of the SourceForge Podcast, we speak with Anna Sharma, Marketing Communications Manager at NetNut, about the evolving landscape of SEO in the age of AI. We discuss how traditional search engines are being challenged by AI tools, the importance of real-time data for AI models, and the role of proxy services in data collection and security. Anna shares insights on NetNut’s products, the significance of AI overviews for brands, and the future of web data collection.

Watch the podcast here:

Listen to audio only here:


Learn more about NetNut.

Interested in appearing on the SourceForge Podcast? Contact us here.


Show Notes

Takeaways

  • SEO is facing challenges from AI tools.
  • NetNut provides essential infrastructure for web data collection.
  • The traditional search engine model is shifting to AI-driven interfaces.
  • Real-time data is crucial for keeping AI models updated.
  • Building an AI-powered search engine requires robust infrastructure.
  • Brands must optimize for visibility in AI-generated results.
  • Proxy services enhance data collection and security.
  • There are underutilized features in NetNut’s offerings.
  • The demand for proxy services is expected to grow.
  • Understanding AI’s logic is essential for brands.

Chapters

00:00 – Introduction to the Digital Dilemma
01:31 – Understanding NetNut’s Role in Web Data Collection
04:16 – The Shift from Traditional Search Engines to AI
06:35 – Keeping AI Models Updated with Real-Time Data
09:36 – Building an AI-Powered Search Engine
13:43 – NetNut’s Key Products and Services
17:31 – The Importance of AI Overviews for Brands
19:51 – Enhancing Data Collection and Security with Proxies
25:51 – Underutilized Features of NetNut
32:23 – The Future of Proxy Services and Web Data Collection

Transcript

Beau Hamilton (00:05)
Hello everyone and welcome to the SourceForge Podcast. Thank you for joining us today. I’m your host, Beau Hamilton, Senior Editor and Multimedia Producer here at SourceForge, the world’s most visited software comparison site where B2B software buyers compare and find business software solutions.

Today’s episode discusses a topic that’s keeping digital marketers, content creators, and web developers up at night. Will SEO survive the AI wave and our traditional search engines living on borrowed time as AI tools rise to power. To help us untangle this digital dilemma, we’re joined by someone who’s right at the center of the web intelligence world, Anna Sharma, Marketing Communications Manager at NetNut, the world’s largest proxy in web data collection network.

NetNut powers the data behind everything from competitive research to SEO intelligence. And I would say Anna has a front row seat to the industry shifts happening in how we find, rank, and interpret information online. So with that said, let me introduce Anna Sharma. Anna, welcome to the podcast. Glad you could join us.

Anna Sharma (01:04)
Thank you Beau, thanks for having me. I’m really happy to join and talk about boiling issue that really keeps everyone on toes.

Beau Hamilton (01:14)
Yeah, it’s true. We got a lot to talk about. Now I just want to get right into it. So maybe we can lay the foundation first. First of all, what is NetNut and its mission within the proxy services and web data collection industry? Can you all break it all down for us?

Anna Sharma (01:31)
Well, basically you can think of it as internet for the enterprises because when a regular person goes on search for the web, it’s pretty much easy to understand. You just go to Google, you write your search and you get the results. When a company search for the web data, it’s a little bit more complicated because to understand the full picture of what your competitors offering or what is your position in the same search engine, you have to try that request from multiple locations, from multiple devices, probably in different time of the day to get the real offer or to real information. And that requires proxy. Proxy basically enables to collect web data on scale.

Beau Hamilton (02:20)
Gotcha. I appreciate that breakdown. So I know NetNut was founded in 2017 before web scraping really became a more household term. And that’s used in connection with AI and training large language models. And in a lot of ways, I would say your company was founded almost at the perfect time to deal with some of these key trends we deal with today, especially concerned with AI and automations, AI agents.

How did NetNut get started and how has the company evolved to meet all these growing demands around, you know, artificial intelligence, of course, but also around anonymity and insight extraction.

Anna Sharma (03:00)
Well, I think that 2017 was a time when companies suddenly realized that they have largest database publicly out there at their fingertips. All they have to do is just to go and take it from there. Organize, structure, clean, analyze and turn it into the insight. But when they start trying to do that, they quickly realize that it’s way more complex than just searching Google and it requires a lot of IPs in order to scale and bypass limitations that sites imply for the search.

Beau Hamilton (03:42)
Yeah, it’s an interesting time, that’s for sure, because the internet for the last couple of decades has been built around these traditional search engines, namely Google. We all go to Google for information, and now that’s changing for the first time. And so as a result, the whole internet kind of gets shaken up because you no longer have the traditional search engine results page, you have these, or SERPs, you have these chat bots that had delivered answers for you.

So I’m curious, what are some of these bigger industry shifts you are seeing right now? I know I just mentioned a few. And I also imagine privacy laws and anti-bot tech detection, they’re changing the game quite a bit. But I’m just curious, broadly speaking, what is your inside perspective on what’s going on behind the scenes with some of these tools and AI chatbots?

Anna Sharma (04:37)
So definitely the shift is happening and a lot of companies right now wondering do they still have to invest in the SEO because ranking number one is no longer means money. Before AI times, the higher you rank, the more traffic you get. Right now you have to shift your paradigm from being seen on the first result page to guess what kind of prompt the end user is going to ask so your website will appear in the results of that search and this is a little bit more complicated task.

Beau Hamilton (05:16)
So AI is really shaking things up for search engines and the way people gather online data. What kind of challenges are you seeing in the industry right now and how is NetNut helping companies just stay ahead of it all?

Anna Sharma (00:30)
Well, the largest challenge is that ranking doesn’t mean traffic anymore. Basically, being cited by AI is more important than being number one in the search results.

And the challenge is that it’s not so visible. You don’t know how the AI brain works. And that’s where the NetNut’s SERP API becomes critical because it allows you to not only see the search results, but also the new feature that Google introduced that’s called AI overview.

And this is where you can see if your website, if your business has been seen by AI, if it has been cited by AI. And also evaluate what kind of resources AI takes into account to which places it gives more credibility, and it gives you an answer on the strategic question where should you put your content and it’s often not your own website.

So for everyone out there who is asking, should we invest more in the SEO? I have an answer. Not in the SEO ranking anymore. Doesn’t matter. Your ranking doesn’t bring you the clicks anymore, but invest in your SEO visibility.

Make sure that those long tail requests that people make with your commercial keywords are answered by AI and the AI quotes your website. So you make your website into a library of expert advice on the topic that you are expert in. And also put an effort into having yourself like everywhere where AI trusts, starting from Wikipedia and ending by, I don’t know, the forum where your readers live.

Beau Hamilton (07:20)
Right. Yeah. I know one challenge, one of the many challenges with large language models is that they’re often trained on static data when they themselves are dynamic and predictive. How do companies keep AI models sort of up to date, especially when it comes to fast changing content like search results?

Anna Sharma (07:39)
Well, that’s where they really need a proxy service like NetNut because we are that infrastructure layer that enables AI companies to get the real-time data from the search engines we all know. Basically, what I think that the search engines, they turning from being the product into the infrastructure. I don’t think that Google or Bing, they’re going to die, but they just becoming a commodity infrastructure for the AI tools that are actually going to do the intelligence search for us and be a new customer interface for the search.

And this is where the proxy comes into play. For AI tool to run multiple searches for the request that the user ask, he needs to have multiple ports, you know, like multiple IPs to serve that search and collect the correct data because it depends. Like if for example, I am making a search request from Brussels, Belgium, but the AI server actually is located somewhere in the United States and AI is doing requests on my behalf, but he wants the results to be relevant for me living in Brussels, Belgium or maybe even I ask it in different language. I can ask it in German, I can ask it in Russian, in English, in any other language.

And so the AI have to take that in account. It has to choose the right country, meaning the right IP. It has to choose the right device. So it has to basically take my user fingerprint and replicate it in its own request. And this is where NetNut proxy comes into the play. And also our, especially our SERP IP tool that allows you to exactly have that settings.

Beau Hamilton (09:30)
That’s great. Okay. So I want to ask you about some of the specific products you offer, but before we do that, I want to get a more kind of conceptual view of some of the tools required for building an AI search engine or assistant. Like, let’s say, let’s say I were going to be, you know, set out on building an AI powered search engine or assistant on my own. What kind of tools or data sources would I need to make that happen?

Anna Sharma (09:56)
Well, basically you will need a very good software engineer who will design the scrapers themselves and then you will need the software itself not that complicated. It has to be just properly written, but then it has to be connected to powerful infrastructure. The infrastructure that knows how to deal with multiple requests running at the same time, because you don’t have time like if I ask right now, ChatGPT, like how to solve this or that problem, I will not wait for half an hour for him to go and check each source one by one, like what would human do, you know? When we search web, what we do? We go, open Google, we have 10 links and then we go link by link by link by link and then we find, okay, this is relevant, this is not relevant. We copy paste and we make some kind of, you know, memory note for ourselves or find the solution maybe on page 20.

But the AI cannot allow such a long time process for itself. So when the user request comes, then it has to visit all those results, search results simultaneously. So it’s basically one user request is asking, I don’t know, how to change the focus setting of my camera, right? So if you go to Google, it will give you like search results, which will be like maybe 100 pages. You will probably scroll down like three, four, and then you will find something what you’re looking.

But the AI, it opens all the links out there simultaneously. So they run in the parallel, requests run in the parallel and the thinking runs in parallel. So out of all those pages, it takes information and it’s making the judgment decision that makes its AI work, its work its intelligence and gives us an instant answer, almost instant. The newest models they already, like in the beginning it was like seconds right now it takes sometimes half a minute a minute because they enabled what they call deep search and deep search it exactly, you know, that the AI goes out there and checks the web and reads the web, reads the pages and the return the snapshot of what he found.

Beau Hamilton (12:08)
Interesting. Yeah, it’s really interesting to think about that you have your set, you’re sending out these, workers essentially going out there and reading these web pages for you and reporting back all within, you know, milliseconds, really. I mean, you mentioned deep search that takes a little bit longer. There’s more research being done. And that’s kind of a fascinating development here with the AI revolution we’re in so I’m curious to see how that works out.

But thanks for, yeah, thanks for kind of breaking down a conceptual view here. And so as in regards to my last question about, you know, if I were to develop a search engine, myself outside of, you know, hiring a software engineer or a software developer, which would be very important. Something I would definitely want to consider. I might, you know, look into some of the products that you offer, for example, NetNut offers. Can you talk about some of those key products and services that you offer today and maybe also mention who benefits the most from them.

Anna Sharma (13:08)
Well, the hottest product that we have right now it is the API for search results of Google. One of the things that we are really proud, that we are the fastest too, like the time response from the request when you send and then when you get response is milliseconds. And that is a huge advantage that we have. And secondly, we also can show you the AI overview. Like right now Google is trying fight this shift to the AI chats. So they’re trying to turn the search panel, the traditional search panel, they try to turn it into the prompt window. So you type your request, however long it is in the form of the prompt and Google will do their best to search.

And they give you before every, every websites that like you used to see traditionally, you will see the AI overview, like they will try to, to give you the best experience, you know, like, what is it exactly that you’re looking? So they kind of trying to be ChatGPT and Google at the same time. And they use their own LLM for that Gemini.

So what’s interesting about it for the businesses, that it’s actually the best tool to see, to understand how AI see your business. For example, go and get your commercial keywords and try them with our SERP IP tool with this feature of Google enabled with the AI overview. And this is exactly how you’re going to see how AI tools are seeing your, or if they see it at all. Do they see your product? Do they know that you provide this kind of services? Would they recommend it and what kind of resources they are using to come to that conclusion? This is the best use case.

If you’re running the business and it’s online, and it depends on the SEO, this is the thing that you have to do. You have to adopt as soon as possible because this is a new SEO. You can get the full visibility of, you know, if AI tools see you, if they trust your brand and because they have different logic than the search engine. And this little snippet, it helps you to understand that. Not only where you’re ranking, but also like if your website is seen, if your brand is seen, or maybe you will find out that your competitor is out there, already holding the place.

Beau Hamilton (15:27)
Right. Yeah. No, it’s interesting to think about. And I think if brands are listening, marketing people are listening to this right now, it’s something to seriously consider if you’re not already thinking about this, because it’s true. It’s like most of us, I mean, a lot of us, I would say, use one of these chatbots, whether it’s Gemini or ChatGPT or Claude. And if you’re using Google, I think we’ve all seen the AI overviews and a lot of times if you do use Google, you’re just using that AI overview, you’re not even really looking at the results anymore. So, and then if you’re, if you’re a brand and you’re not being placed in the results, you’re not getting any traffic. It’s just, it’s really gonna, you know, hurt your reach.

Anna Sharma (16:08)
Basically, if before the AI, you were optimizing for the ranking, right now you’re optimizing for being cited. And to understand the logic of the AI, it’s a challenge because even the creators of the LLM, not always sure how the machine makes a decision. It’s a black box. So as seeing it in this way, it gives you some insight intelligence that you can say, okay, maybe I need to be quoted on these resources. It doesn’t have to be your website. Maybe people will never ever ever visit your website anymore. They will do everything they want from the chat.

But you still have to maintain that because right now you’re writing it basically for the AI. It has to be clean, it has to be structured, it has to have a deep customer value like the answer to the core questions that people have about the product. So invest in the content and the content that is useful for the competitor and make sure that while investing in it, you monitor what actually people are searching through this SERP API tool and how the AI sees you and if it sees you at all.

Beau Hamilton (17:21)
Yeah, invest in the tools. Start thinking about some of these tools that are out there to help you get mentioned in these overviews and just maximize your compatibility with these AI tools. So you mentioned NetNut has the SERP IP tool. I know you also just, and generally speaking, you have a pretty comprehensive stack of products, you know, everything from collecting data on APIs to unblockers to customer data sets.

And the fact that it services everyone from AI researchers to cybersecurity teams, I think just shows its versatility, which I think is a neat selling point. But how do the different types of proxies NetNut offers actually enhance data collection and security across various use cases? Can you talk about that?

Anna Sharma (18:09)
Well, we all know that internet is not only a nice place, like most of the users, they know that there is a Internet and that’s it. But there is also dark web. There’s also all kind of bad actors that do DDoS attacks, etc. And the security teams in order like what do you need to do to protect yourself or your company or your assets from this kind of attacks, you need to try to replicate this attack on your website. So this is one of the use cases for the proxies.So security teams, they use our infrastructure to do the stress test, the penetration test on their system to find the vulnerabilities from different devices, from different settings. It’s one of the things.

And second, for a lot of OSINT teams and intelligence teams, they are using proxy to monitor black dark web. One of our customers I cannot name, but they actually using a proxy to monitor if the corporate data of their clients appeared on the dark web. So they kind of, and you know, they have a protocol like to prevent it being resolved, etcetera, etcetera. So they have a portfolio of clients who trust them to monitor dark web for them.

And when we talk about the brands, brands they use proxy as well for brand protection. Imagine how the luxury brands suffer from multiple unauthorized resellers. Especially when it comes to sneakers, a lot of resellers they just buy the stock in the first second that it’s out and then they resell it with a 3-500% increase of their price. And proxy basically allows brands that want to protect their intellectual property and they want to protect their brand to mimic the real users that those pirates are targeting. Fashion industry and consumer electronics, these are two of the industries that suffer the most.

Beau Hamilton (20:05)
Well, you’ve given me a lot of things to think about, and I appreciate all those examples. Yeah, my mind was just kind of going down the different rabbit holes of thinking all these different use cases for proxies and the, I guess the good use cases, everything from like stress testing and kind of penetration testing of the security of one’s website to some of the more nefarious activities on the dark web, for example, which we won’t mention, but is a, yeah, just interesting nonetheless.

You obviously offer a lot of these, a lot of different core offerings that you’ve some of which you’ve mentioned, but are there any hidden gems or maybe underutilized features that you offer and would maybe recommend clients take advantage of?

Anna Sharma (20:54)
Well, I still see that a lot of companies invest into a lot of resources into developing their own scraping APIs. And it’s just not necessary anymore. We have a lot of scraping APIs that can be just used like a plug and play and it will have a fraction of the cost. Like basically it’s not something new that we have invented. We just offer it as a additional service layer to the proxy infrastructure.

And it has a lot of benefits for the end customer because it is much more cost effective. And secondly, it has native integration with the largest proxy infrastructure out there and the benefits of the speed and the support from our engineering team. We have an amazing team that actually works as an extension of your own. And we are like partners in development more than like out-of-the-box product.

Everyone has the unique need, like somebody wants to get information about the used cars from the classified websites. What kind of, you know, this is a very unique request, you know, it’s nuts, but they do have, you know, intelligence that they want to extract from that. They want to understand.

There was a company that were going into the new market in Europe. It’s a grocery store and they wanted to bring new sort of apples, you know, and they just needed to compare apples, literally apples to apples. And it’s not the consumer electronics that has, you know, series and the number etc. So you have to collect the data in a certain way, you have to collect data like the pictures, the description, the price, trying to understand if those apples are the same apples.

So every customer is different and also there are more challenging and less challenging websites. Like for example, it’s very difficult to get data from Shopee. The fastest retail platform that is developing right now in Asia. And every brand dreams of being there and they dream of understanding who is already there and what they’re doing, what they’re selling and how they’re selling, what is their product and it’s a huge challenge to do but we offer that as well. Like this is one of the most asked features.

And yeah, like on top of the API, like if your task is just like ongoing real-time data collection, then take a look at an unblocker. This is something that can save a lot of time to your team if you don’t want to ever face capture or like suddenly your data flow interrupted by, for example, changing the website, decided to change the layout. This is what happened. Not so far, I think like a year ago when Facebook suddenly decided to change the layout and all the web scrapers that were working suddenly stopped in one day.

For that, unblocker it has some artificial intelligence of its own, it’s a self-healing tool that can quickly adapt to the dynamically changing websites and it’s one of the best tools on market. The web scraping lab called it number one because it delivers almost 99% success rate.

Beau Hamilton (24:02)
Wow. That’s pretty good. It’s hard to beat that. Yeah. It’s so you’ve got obviously a lot of tools and it’s interesting. Yeah. To think of, you know, you have to, you know, help all these different customers with all these different specific needs. I think the, we’re all so concerned with the ecommerce and consumer electronics and some of these more digital products and tech products, but like the Apple example is great where it’s like, these are, these are brands and businesses selling products that don’t have like serial numbers and a lot of data out there on the web. So you have to kind of, you know, it just makes another, it makes it a little bit more challenging to kind of work with them, but it’s, they’re still, you know, there’s that’s what that goes for a lot of different brands out there.

Well, so, okay, looking ahead, like, where do you see, obviously, AI and automation is accelerating. It’s the next big frontier here. We’ve talked about kind of the traditional SEO and how it’s shifting into this generative engine optimization, AI world and standpoint. So what does the future of proxy services and web data collection look like from your perspective? Where is this all headed?

Anna Sharma (25:17)
Well, I think the future is very promising because the hunger for data is only growing. And right now, if web data was the very niche thing reserved to online retailers and ecommerce and maybe security intelligence teams, right now, even me and I, we are scrapers. Every time you go to ChatGPT or Gemini to ask, I don’t know, what is the best restaurant to go with your girlfriend or boyfriend, then we are scraping the web. We just don’t do it, you know, by writing the code and launching it from our machines, but that we are directing the AI to do that for us.

So the demand for the proxy infrastructure will only grow unless there is something changed. Unlike the scenario of that, for example, Google decides to shift, really to shift from being a customer product into becoming a search engine infrastructure and we’ll see AI as their primary customers and we’ll start, you know, serving access to its index as a product.

That’s what I personally think will industry go, but I don’t know when it’s happened. I don’t see it happening in next couple of years, but that would be the most natural thing them to do. It’s already happening on the customer level. If they want it, if they don’t want it, people are already searching Google through the ChatGPT. They search it through Gemini, they search it through DeepSeek. They searching it.

Beau Hamilton (26:46)
Yeah. It’s a neat, it’s a neat division to imagine. That’s for sure. I think, definitely some uncertainty from a business business perspective, but, I would say definitely some excitement from a customer standpoint. But just ultimately seeing all these different, being able to get all these, you know, really detailed answers to very specific questions, is always, it’s always a plus from a, from an individual standpoint, but obviously a lot of change behind the scenes. And I think you’ve opened my eyes to a lot of it. So I appreciate that.

And I think it’s clear that like, to give you a sort of analogy, NetNut isn’t just like riding the wave of this AI world we’re entering, but you guys are kind of helping build the surfboard, so to speak. So I think that’s pretty exciting.

Anna Sharma (27:32)
Well, I think unless you have, I don’t know any other solution that can enable a startup or company to get that public data and have it structured. There are many companies that build their business on top of the proxy infrastructure that we provide. That’s most of our customers, but that’s the way, I mean, there has to be, that’s the layer between the business and the AI and the search engines. So this is the essential part of it. You can’t take it out of it. You need it.

Beau Hamilton (28:02)
Right. Yeah. I think forming, you know, it’s not just tech, it’s partnership. And at the end of the day, it’s still, it’s automation paired with human support. And you always gotta, you know, just have that close relationship in order to really be successful and then get the most out of it and endure this change.

And I think you guys are doing a good job. And I appreciate all the insights you’ve shared with us. And I hope listeners are taking note. I’m sure they are. But I just want to ask, for those curious and wanting to learn more about NetNut and some of its products and solutions, where should they go?

Anna Sharma (28:42)
Well, you’re welcome to visit our website netnut.io and you can find us on LinkedIn or personally connect with me. And I’m happy to answer your questions. I’ve been in proxy industry for, I think I’ve joined the proxy in 2019 so in the last six years I’m in this market for six years already so I have, you know, a lot a lot of things that maybe if you want to ask me something I’m open my LinkedIn and our company’s LinkedIn and our company website.

Beau Hamilton (29:14)
All right, netnut.io and go visit them on LinkedIn, get in touch with Anna. Anna, you’re a wealth of information. I appreciate everything that you shared with us. And yeah, just thanks for taking the time out of your day to sit down and talk with us about this industry. I think I’ve learned a lot.

Anna Sharma (29:29)
Thank you Beau for having me and leading this conversation.

Beau Hamilton (29:34)
Absolutely. All right, everyone, that’s Anna Sharma, Marketing Communication Manager at NetNut. Thanks again. I really appreciate it.

And thank you all for listening to the SourceForge Podcast. I’m your host, Beau Hamilton. Make sure to subscribe to stay up to date with all of our upcoming B2B software related podcasts. I will talk to you in the next one.