在Python中下载文件是一项常见任务无论是从网页下载图片、文档还是通过API获取数据掌握文件下载技术都是开发者的必备技能。本文将系统介绍Python下载文件的多种方法涵盖基础实现、高级技巧和常见问题解决方案。一、基础方法使用标准库下载文件1. 使用urllib.requestPython内置库importurllib.request urlhttps://example.com/file.zipfilenamedownloaded_file.ziptry:urllib.request.urlretrieve(url,filename)print(f文件已下载到:{filename})exceptExceptionase:print(f下载失败:{e})特点无需安装第三方库适合简单下载场景缺乏进度显示和错误处理细节2. 使用requests库推荐importrequests urlhttps://example.com/file.zipfilenamedownloaded_file.ziptry:responserequests.get(url,streamTrue)# 使用流式下载大文件response.raise_for_status()# 检查请求是否成功withopen(filename,wb)asf:forchunkinresponse.iter_content(chunk_size8192):# 分块写入ifchunk:# 过滤掉keep-alive新块f.write(chunk)print(f文件已下载到:{filename})exceptrequests.exceptions.RequestExceptionase:print(f下载失败:{e})优势更简洁的API支持流式下载适合大文件完善的错误处理机制可添加请求头、代理等高级功能二、进阶技巧增强下载功能1. 显示下载进度importrequestsfromtqdmimporttqdm# 需要安装: pip install tqdmurlhttps://example.com/large_file.zipfilenamelarge_file.ziptry:responserequests.get(url,streamTrue)total_sizeint(response.headers.get(content-length,0))withopen(filename,wb)asf,tqdm(descfilename,totaltotal_size,unitiB,unit_scaleTrue,unit_divisor1024,)asbar:forchunkinresponse.iter_content(chunk_size8192):f.write(chunk)bar.update(len(chunk))print(\n下载完成!)exceptExceptionase:print(f下载失败:{e})2. 断点续传功能importosimportrequests urlhttps://example.com/large_file.zipfilenamelarge_file.zip# 检查是否已部分下载downloaded_size0ifos.path.exists(filename):downloaded_sizeos.path.getsize(filename)headers{Range:fbytes{downloaded_size}-}try:responserequests.get(url,headersheaders,streamTrue)response.raise_for_status()withopen(filename,ab)asf:# 以追加模式打开forchunkinresponse.iter_content(chunk_size8192):ifchunk:f.write(chunk)print(下载完成!)exceptExceptionase:print(f下载失败:{e})3. 多线程/异步下载加速下载importrequestsfromconcurrent.futuresimportThreadPoolExecutorimportosdefdownload_chunk(url,start,end,filename,chunk_num):headers{Range:fbytes{start}-{end}}try:responserequests.get(url,headersheaders,streamTrue)withopen(f{filename}.part{chunk_num},wb)asf:forchunkinresponse.iter_content(chunk_size8192):f.write(chunk)returnTrueexceptExceptionase:print(f分块{chunk_num}下载失败:{e})returnFalsedefmerge_files(filename,num_chunks):withopen(filename,wb)asoutfile:foriinrange(num_chunks):part_filenamef{filename}.part{i}ifos.path.exists(part_filename):withopen(part_filename,rb)asinfile:outfile.write(infile.read())os.remove(part_filename)urlhttps://example.com/very_large_file.zipfilenamevery_large_file.zipfile_size1024*1024*100# 假设文件100MBchunk_size1024*1024*10# 每块10MBnum_chunksfile_size//chunk_size# 创建线程池下载各分块withThreadPoolExecutor(max_workers5)asexecutor:futures[]foriinrange(num_chunks):starti*chunk_size endstartchunk_size-1ifi!num_chunks-1elsefile_size-1futures.append(executor.submit(download_chunk,url,start,end,filename,i))# 等待所有分块下载完成forfutureinfutures:future.result()# 合并分块merge_files(filename,num_chunks)print(下载并合并完成!)三、常见场景解决方案1. 下载网页上的所有资源importrequestsfrombs4importBeautifulSoupimportosdefdownload_resources(url,output_folderdownloads):os.makedirs(output_folder,exist_okTrue)try:responserequests.get(url)soupBeautifulSoup(response.text,html.parser)# 下载图片forimginsoup.find_all(img):img_urlimg.get(src)ifimg_urlandnotimg_url.startswith(data:):ifnotimg_url.startswith((http://,https://)):img_urlf{url}/{img_url}ifnoturl.endswith(/)elsef{url}{img_url}try:img_datarequests.get(img_url).content img_nameos.path.join(output_folder,img_url.split(/)[-1])withopen(img_name,wb)asf:f.write(img_data)exceptExceptionase:print(f图片下载失败:{e})# 可以类似地下载CSS/JS等资源print(资源下载完成!)exceptExceptionase:print(f网页下载失败:{e})download_resources(https://example.com)2. 使用代理下载importrequests proxies{http:http://10.10.1.10:3128,https:http://10.10.1.10:1080,}urlhttps://example.comtry:responserequests.get(url,proxiesproxies)withopen(page.html,w,encodingutf-8)asf:f.write(response.text)print(通过代理下载成功!)exceptExceptionase:print(f代理下载失败:{e})3. 处理下载重定向importrequests urlhttp://example.com/redirecting_linktry:responserequests.get(url,allow_redirectsTrue)# 默认允许重定向final_urlresponse.url# 获取最终URLprint(f最终URL:{final_url})# 下载最终文件withopen(final_file.txt,wb)asf:f.write(response.content)exceptExceptionase:print(f下载失败:{e})四、最佳实践与注意事项错误处理始终添加异常处理特别是网络请求可能因各种原因失败资源清理使用with语句确保文件正确关闭大文件处理使用流式下载(streamTrue)和分块写入安全性验证SSL证书默认行为对用户提供的URL进行验证限制文件类型和保存路径性能优化合理设置分块大小通常8KB-1MB多线程下载适合高延迟网络考虑使用异步IO如aiohttp提高并发性能五、完整示例带进度条的下载函数importrequestsfromtqdmimporttqdmimportosdefdownload_file(url,filenameNone,chunk_size8192): 下载文件并显示进度条 :param url: 文件URL :param filename: 保存文件名可选默认从URL提取 :param chunk_size: 分块大小字节 :return: 保存的文件路径 try:# 获取文件名如果未提供iffilenameisNone:filenameos.path.basename(url.split(?)[0])# 去除查询参数# 发送请求responserequests.get(url,streamTrue)response.raise_for_status()# 获取总大小如果服务器提供total_sizeint(response.headers.get(content-length,0))# 创建进度条progress_bartqdm(descfilename,totaltotal_size,unitiB,unit_scaleTrue,unit_divisor1024,)# 写入文件withopen(filename,wb)asf:forchunkinresponse.iter_content(chunk_sizechunk_size):f.write(chunk)progress_bar.update(len(chunk))progress_bar.close()print(f\n文件已保存到:{os.path.abspath(filename)})returnfilenameexceptrequests.exceptions.RequestExceptionase:print(f下载失败:{e})returnNone# 使用示例download_file(https://example.com/sample.pdf,my_document.pdf)总结Python提供了多种下载文件的方法从简单的urllib到功能强大的requests库再到结合多线程/异步的优化方案。根据实际需求选择合适的方法简单下载requests.get() 文件写入大文件下载流式下载 分块写入需要进度显示结合tqdm高并发需求多线程/异步下载特殊需求代理、断点续传等高级功能掌握这些技术后你可以轻松应对各种文件下载场景构建更健壮的Python应用程序。