纯文字网页数据的批量下载--以NOAA为例

本文介绍了如何从NOAA获取气候数据,包括使用FTP和爬虫(requests库)的两种方式。FTP方法简单直接,但可能会遇到网络卡顿;爬虫则需要设置请求网址,适合处理有规律的URL。示例代码展示了如何下载特定格式的气候数据文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

NOAA

下载网址为:

网页数据说明:

两种方式比较:

爬虫(requests)

FTP(ftplib)

迅雷


NOAA

NOAA一般指美国国家海洋和大气管理局,主要关注地球的大气和海洋变化,提供对灾害天气的预警,提供海图和空图,管理对海洋和沿海资源的利用和保护,研究如何改善对环境的了解和防护。

NOAA中主要包括大气,海洋,气象等众多的数据,本次主要在下载应用于定量遥感的大气数据,全部为文本数据。

NOAA数据列表,有点古老了,转载:

NOAA数据列表_yangcan69_新浪博客NOAA数据列表_yangcan69_新浪博客,yangcan69,icon-default.png?t=LA92http://blog.sina.com.cn/s/blog_d8f6ec6b0102wu9c.html

DatasetsAreal CoverageGrid SizeTime StepTime CoverageLevels
CMAP PrecipitationGlobal2.5°x2.5°Monthly,Pentad1979-None
COBE-SSTGlobal1.0°x1.0°Monthly1891-presentNone
COBE-SST2 Sea Surface TemperatureGlobal1.0°x1.0°Monthly1850-2012None
CPC .25x.25 Daily US Unified PrecipitationU.S..25°x.25°Daily1948-2006None
CPC Soil MoistureGlobal2.5°x2.5°Monthly1948-presentNone
CRU Air Temperature and Combined Air Temperature/Marine Anomalies V3Global5.0°x5.0°Monthly1850-presentNone
CRU Air Temperature and Combined Air Temperature/Marine Anomalies V4Global5.0°x5.0°Monthly1850-2013None
Dai Palmer Drought Severity IndexGlobal: 58S-76N2.5°x2.5°Monthly1850-2014None
GFS Model OutputGlobal2.5°x2.5°2X Daily1979-presentvaried levels
GHCN Version 3 Land Temperature DatasetGlobal5.0°x5.0°Monthly1900-presentNone
GHCN version 2 Land Precipitation DatasetGlobal5.0°x5.0°Monthly1900-presentNone
NOAA GHCN_CAMS Land Temperature AnalysisGlobal0.5°x0.5°Monthly1948-presentNone
GISS surface temperature analysisGlobal1.0°x1.0°Monthly1880-presentNone
Global Precipitation Climatology Centre (GPCC)Global0.5°x0.5°,1.0°x1.0°, 2.5°x2.5°Monthly1901-presentNone
GPCP V2.3 PrecipitationGlobal2.5°x2.5°Monthly1979-2013None
ICOADSGlobal2.0°x2.0°,1.0°x1.0°Monthly1800-presentNone
Interpolated OLRGlobal2.5°x2.5°Daily,Monthly1979-near presentNone
Kaplan SSTGlobal5.0°x5.0°Monthly1856-presentNone
Livneh daily CONUS near-surface gridded meteorological and derived hydrometeorological data.正在上传…重新上传取消​CONUS0.06°x0.06°Daily, Monthly1915-2011None
MSUGlobal2.5°x2.5°Daily, Monthly1979-1996None
NCEP Operational AnalysisGlobal2.5°x2.5°Daily1979-presentNone
NCEP MarineGlobal2.0°x2.0°Monthly1991-presentNone
NCEP/NCAR ReanalysisGlobal2.5°x2.5°, T42 Gaussian, T62 spectral4X Daily, Daily, Monthly1948-present17 pressure levels, 28 spectral
NCEP/NCAR Reanalysis Products Derived at PSDGlobal2.5°x2.5°, T62 GaussianDaily, Monthly1948-present17 pressure levels, 28 spectral
NCEP/DOE Reanalysis IIGlobal2.5°x2.5°4X Daily, Daily, Monthly1979-Dec 201217 Pressure levels
NOAA Extended Reconstructed SST V3bGlobal1.0°x1.0°Monthly1854-presentNone
NOAA Extended Reconstructed SST V4 正在上传…重新上传取消​Global1.0°x1.0°Monthly1854-presentNone
NOAA's Outgoing Longwave Radiation–Daily Climate Data Record (OLR–Daily CDR): PSD Interpolated Version 正在上传…重新上传取消​Global1.0°x1.0°Daily1979-2012None
NOAA High-resolution Blended Analysis of Daily SST and Ice.Global.25°x.25°Daily1981-presentNone
NOAA Global Surface Temperature (NOAAGlobalTemp)Global5.0°x5.0°Daily1880-presentNone
NOAA Optimum Interpolation (OI) SST V2Global1.0°x1.0°Monthly,Weekly1981-presentNone
NODC (Levitus) World Ocean Atlas 1994Global1.0°x1.0°Monthly,AnnualClimoNone
NODC (Levitus) World Ocean Atlas 1998Global1.0°x1.0°Monthly,AnnualClimoNone
NOAA's Precipitation Reconstruction (PREC)Global2.5°x2.5°Monthly1979-presentNone
NOAA's Precipitation Reconstruction over Land (PREC/L)Global2.5°x2.5°Monthly1948-presentNone
North American Regional Reanalysis (NARR)Northern Hemisphere32km grids8X Daily, Daily, Monthly1979-Dec 201228 Pressure
Northern Hemisphere EASE-Grid Snow Cover and Sea Ice ExtentNorthern Hemisphere1.0°x1.0°Monthly1971-1995None
NOAA-CIRES 20th Century Reanalysis (V2)Global2.5°x2.5°4X Daily1871-201224 pressure levels
NOAA-CIRES 20th Century Reanalysis (V2c) 正在上传…重新上传取消​Global2.5°x2.5°4X Daily1851-201424 pressure levels
U. of Delaware Precipitation and Air TemperatureGlobal Land0.5°x0.5°Monthly1900-2014None
Uninterpolated OLRGlobal2.5°x2.5°Daily1991-presentNone

下载网址为:

Uninterpolated OLR

网页数据说明:

 网页数据显示为纯文本,

服务器只响应一个document文本

而且网页就是下载页面,对爬虫等没有任何的限制

 

两种方式比较:

爬虫要设置请求网址,很麻烦,如果请求网址是乱序的,根本无法处理

FTP好用,简单,不过偶尔可能卡顿

爬虫(requests)

import requests
import os


def download():
    """下载网址为: https://gml.noaa.gov/aftp/data/radiation/surfrad/aod/fpk"""
    # 前闭后开,后面 + 1
    for year in range(1997, 2021 + 1):
        for month in range(1, 12 + 1):
            for day in range(1, 31 + 1):

                # 文件名拼接, 样例为:fpk_19970406.aod
                # y = str(year)[2:]
                if month < 10:
                    m = '0' + str(month)
                else:
                    m = str(month)
                if day < 10:
                    d = '0' + str(day)
                else:
                    d = str(day)
                time = str(year) + str(m) + str(d)
                file_name = "fpk_" + time + ".aod"

                # 文件存储地址
                file_path = os.path.join("./data/aod/fpk/", str(year) + '/', file_name)

                # 网页请求地址
                html_path = os.path.join("https://gml.noaa.gov/aftp/data/radiation/surfrad/aod/fpk/", str(year) + '/',
                                         file_name)
                print(file_path)
                print(html_path)

                response = requests.get(html_path)
                print(response.status_code)

                # 有些数据缺失,故对响应状态进行判断
                if response.status_code != 404:
                    (filepath, tempfilename) = os.path.split(file_path)
                    if not os.path.exists(filepath):
                        os.makedirs(filepath)
                    with open(file_path, "w") as file:
                        file_data = response.content.decode('gbk')
                        # print(file_data)
                        if file_data:
                            file.write(file_data)


def main():
    download()


if __name__ == '__main__':
    main()

FTP(ftplib)

import os.path
from ftplib import FTP
from pathlib import Path


def main():
    for year in range(1997, 2021 + 1):
        html_str = os.path.join("/data/radiation/surfrad/aod/gwn/", str(year))
        file_path = os.path.join("D:/js/python/code/data/aod/gwn/", str(year))
        print(html_str)
        print(file_path)

        if not os.path.exists(file_path):
            os.makedirs(file_path)

        ftp = FTP("ftp.gml.noaa.gov")
        ftp.login()
        # https://gml.noaa.gov/aftp/data/radiation/surfrad/aod/gwn/
        # ftp.cwd('/data/radiation/surfrad/aod/gwn/')
        ftp.cwd(html_str)

        # Get all files
        files = ftp.nlst()

        # Download all the files to C:\Temp
        for file in files:
            print("Downloading..." + file)
            # ftp.retrbinary(f'RETR {file}', open(str(Path(r'C:\Users\z6q6k6\Desktop\aaa') / file), 'wb').write)
            ftp.retrbinary(f'RETR {file}', open(str(Path(file_path) / file), 'wb').write)
        ftp.close()


if __name__ == '__main__':
    main()

迅雷

先使用chrome插件Link Grabber来将所有链接提取出来,再利用迅雷下载。

如何从NOAA下载SST数据_Andrew_SJ的博客-CSDN博客_noaa数据下载NOAA提供的gridded climate数据,包含sst数据。其来源是SST数据地址可以看到,有很多种类的数据,现在选择其中一种:点击右边的Catalog,进入数据库目录,可以看到很多条数据:找到需要下载的数据,使用chrome插件Link Grabber来将所有链接提取出来,然后复制需要下载的链接即可:此时如果直接将这些链接粘贴到迅雷里是无法下载的。我们点开一条数据来看它的具体下载链接:可以看到它的http服务器下载链接:https://psl.noaa.gov/thredds/https://blog.csdn.net/Andrew_SJ/article/details/108488379

 

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

z6q6k6

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值