Python 多线程与多进程：高效处理并发任务的实战指南

发布时间：2024-12-29 01:21

避免多任务处理：专注于一项任务，提高效率。 #生活常识# #时间管理建议# #高效日程安排#

Python 多线程与多进程：高效处理并发任务的实战指南

在这里插入图片描述

在现代编程中，并发是提高程序性能的关键之一，尤其是在处理 I/O 密集型或 CPU 密集型任务时。Python 提供了多线程和多进程两种并发模型，各自有不同的适用场景和优缺点。

本文将深入探讨多线程与多进程的区别、使用场景，并通过代码示例展示如何在 Python 中高效实现并发处理。

1. 并发的基本概念

在计算机中，并发是指在同一时间段内处理多个任务。并发的目的是提高程序的吞吐量和效率。

多线程：在同一进程内创建多个线程，线程共享进程的内存空间，开销较小。多进程：创建多个独立的进程，每个进程有独立的内存空间，开销较大，但避免了全局解释器锁（GIL）的限制。

2. 多线程与多进程的区别

特性多线程多进程适用任务类型I/O 密集型任务（如网络请求、文件读写）CPU 密集型任务（如数学运算、大数据处理）资源消耗占用较少内存，创建速度快占用较多内存，创建速度慢全局解释器锁（GIL）受 GIL 限制，不能真正实现并行不受 GIL 限制，可以真正并行进程间通信使用共享变量或线程安全队列使用 multiprocessing.Queue 等方式稳定性如果某个线程崩溃，整个进程可能会崩溃每个进程独立，单个进程崩溃不会影响其他进程

3. 多线程的实现

Python 的 threading 模块提供了方便的接口来创建和管理线程，适合 I/O 密集型任务。

示例：爬取多个网页内容

以下代码使用多线程爬取多个网页，并统计每个网页的字符数。

import threading import requests # 定义线程任务 def fetch_url(url): try: response = requests.get(url) print(f"URL: {url} - 长度: {len(response.text)}") except Exception as e: print(f"URL: {url} - 出错: {e}") # 要爬取的网页列表 urls = [ "https://www.python.org", "https://www.djangoproject.com", "https://flask.palletsprojects.com", "https://fastapi.tiangolo.com" ] # 创建线程 threads = [] for url in urls: thread = threading.Thread(target=fetch_url, args=(url,)) threads.append(thread) thread.start() # 等待所有线程完成 for thread in threads: thread.join() print("所有网页爬取完成！")

12345678910111213141516171819202122232425262728293031 运行结果

URL: https://www.python.org - 长度: 50094 URL: https://www.djangoproject.com - 长度: 29486 URL: https://flask.palletsprojects.com - 长度: 13589 URL: https://fastapi.tiangolo.com - 长度: 12456 所有网页爬取完成！ 12345

4. 多进程的实现

Python 的 multiprocessing 模块允许在多核 CPU 上并行执行任务，是解决 CPU 密集型任务的利器。

示例：计算密集型任务的并行处理

以下代码使用多进程计算一组数字的平方。

import multiprocessing # 定义进程任务 def calculate_square(number): print(f"进程 {multiprocessing.current_process().name} 计算 {number} 的平方") return number * number # 要计算的数字列表 numbers = [1, 2, 3, 4, 5] # 创建进程池 with multiprocessing.Pool(processes=3) as pool: results = pool.map(calculate_square, numbers) print("计算结果:", results) 123456789101112131415 运行结果

进程 ForkPoolWorker-1 计算 1 的平方进程 ForkPoolWorker-2 计算 2 的平方进程 ForkPoolWorker-3 计算 3 的平方进程 ForkPoolWorker-1 计算 4 的平方进程 ForkPoolWorker-2 计算 5 的平方计算结果: [1, 4, 9, 16, 25] 123456

5. 线程池与进程池

对于大量任务的并发处理，手动创建线程或进程可能变得繁琐。Python 提供了 线程池 和 进程池 简化管理。

示例：线程池实现

使用 concurrent.futures.ThreadPoolExecutor 创建线程池。

from concurrent.futures import ThreadPoolExecutor # 定义任务 def fetch_data(item): print(f"处理 {item}") return f"结果: {item * 2}" # 数据列表 data = [1, 2, 3, 4, 5] # 使用线程池 with ThreadPoolExecutor(max_workers=3) as executor: results = list(executor.map(fetch_data, data)) print("处理结果:", results) 123456789101112131415 示例：进程池实现

使用 concurrent.futures.ProcessPoolExecutor 创建进程池。

from concurrent.futures import ProcessPoolExecutor # 定义任务 def process_data(item): print(f"进程处理 {item}") return item ** 3 # 数据列表 data = [1, 2, 3, 4, 5] # 使用进程池 with ProcessPoolExecutor(max_workers=3) as executor: results = list(executor.map(process_data, data)) print("处理结果:", results) 123456789101112131415