Webhook 能返回 202、任务能进库之后又产生了新的问题如果默认谁 POST 过来我都信那和在内网裸奔一个调试接口差别不大GitHub 重试一次 delivery我这边会不会又建一条任务审查跑得慢的时候队列里别的任务会不会一直吃不上 CPU。这些问题不先收口后面再加 OAuth、多仓库只是在更脆的入口上叠功能。最早一版仓库里的习惯写法是把 body 直接绑成 Pydantic 模型接口很短本地演示特别快。但签名校验必须用原始字节算 HMAC一旦让框架先解析过 body再想去对齐X-Hub-Signature-256就会别扭。所以现在的写法是先整块读入再在同一串raw_body上做签名、解码、分支路由。router.post(/webhooks/github,status_codestatus.HTTP_202_ACCEPTED,)async def github_webhook(request: Request,db: Session Depends(get_db_session),) - dict[str, object]:settings get_settings()raw_body await request.body()signature request.headers.get(X-Hub-Signature-256)delivery_id request.headers.get(_GITHUB_DELIVERY_HEADER)event_name request.headers.get(_GITHUB_EVENT_HEADER)request_timestamp _extract_request_timestamp(request)_validate_github_signature(signaturesignature, raw_bodyraw_body, settingssettings)_validate_delivery_headers(delivery_iddelivery_id, event_nameevent_name, settingssettings)_validate_replay_time_window(request_timestamprequest_timestamp, settingssettings)normalized_event_name (event_name or pull_request).strip().lower()payload_data _decode_github_webhook_payload(raw_bodyraw_body, content_typerequest.headers.get(content-type))if normalized_event_name pull_request:payload _validate_github_pull_request_payload(payload_data)actor payload.pull_request.user.logintask create_review_task_from_github_webhook(db,payload,actor,delivery_iddelivery_id,event_namenormalized_event_name,raw_bodyraw_body,)return serialize_review_task_summary(task).model_dump(modejson)签名函数单独拆出去逻辑很朴素根据配置判断「这次要不要验」要验就算sha256摘要用compare_digest防时序旁路。我本地 mock 时经常关掉强制校验但一想到切到github模式就会收紧就把「何时必须验」写进条件里而不是靠人肉记得改环境。def _validate_github_signature(*, signature: str | None, raw_body: bytes, settings: Settings) - None:should_validate (settings.github_webhook_require_signatureor settings.git_provider_mode.strip().lower() githubor bool(signature))if not should_validate:returnif not signature:raise HTTPException(status_code401, detailMissing GitHub webhook signature.)expected sha256 hmac.new(settings.github_webhook_secret.encode(utf-8),raw_body,hashlib.sha256,).hexdigest()if not hmac.compare_digest(signature, expected):raise HTTPException(status_code401, detailGitHub webhook signature mismatch.)delivery 和 event 头单独校验是因为幂等和排错都依赖它们没有X-GitHub-Delivery很难在库里说认出每一条投递没有X-GitHub-Event以后issue_comment和pull_request混在一个 URL 上时只能靠猜。_validate_delivery_headers里按deployment_mode或开关决定是否强制这样本地松、部署紧不必改代码。真正提供帮助的是服务层create_review_task_from_github_webhook现在吃delivery_id和raw_body先算指纹再查GitHubWebhookDelivery。命中已有 delivery 就只写webhook.duplicate_skipped审计并返回旧任务窗口内相似指纹走webhook.replay_window_skipped。写审计时我一开始嫌啰嗦后来发现没有这几行GitHub 后台 delivery 列表和本地数据库根本对不上号。def create_review_task_from_github_webhook(db: Session,payload: GitHubPullRequestWebhookPayload,actor: str,*,delivery_id: str | None None,event_name: str pull_request,raw_body: bytes | None None,) - ReviewTask:slug payload.repository.full_namefingerprint hashlib.sha256(raw_body or payload.model_dump_json().encode(utf-8)).hexdigest()replay_window_seconds max(int(get_settings().github_webhook_replay_window_seconds), 0)if delivery_id:existing_delivery db.scalar(select(GitHubWebhookDelivery).where(GitHubWebhookDelivery.delivery_id delivery_id))if existing_delivery is not None and existing_delivery.review_task_id is not None:existing_task _get_review_task_with_context(db, existing_delivery.review_task_id)if existing_task is not None:db.add(AuditLog(actionwebhook.duplicate_skipped,))db.commit()return existing_taskrecent_replay _find_recent_replayed_delivery(db,delivery_iddelivery_id,)if recent_replay is not None and recent_replay.review_task_id is not None:db.commit()return existing_task入口收紧之后异步侧的问题会浮上来。本地长期CELERY_TASK_ALWAYS_EAGERtrue很省事但要模拟「HTTP 秒回、审查慢慢跑」就必须 Redis Worker。Worker 默认 prefetch 一大我遇到过队列里有任务却半天轮不到的情况所以在应用里把worker_prefetch_multiplier钉成 1并给审查任务加了软/硬超时避免一条任务把 worker 占死。celery_app Celery(codeguard,brokersettings.celery_broker_url,backendsettings.celery_result_backend,include[app.tasks.review_tasks],)celery_app.conf.update(task_always_eagersettings.celery_task_always_eager,task_serializerjson,result_serializerjson,accept_content[json],timezoneUTC,worker_prefetch_multiplier1,task_soft_time_limitmax(0, int(settings.review_task_worker_soft_timeout_seconds)),task_time_limitmax(0, int(settings.review_task_worker_hard_timeout_seconds)),)部署上我不想把并发写死在镜像命令里docker-compose里用环境变量覆盖--concurrency再配合--prefetch-multiplier1和-O fair改并发只动.env或 compose 即可。worker:build:context: ./backendcommand: celery -A app.tasks.celery_app.celery_app worker --loglevelinfo --concurrency${CELERY_WORKER_CONCURRENCY:-1} --prefetch-multiplier1 -O fair对照手里较早的那份快照Webhook 仍是一个POST但职责已经完全不同那份是「模型一绑就进业务」现在是「先字节、再信任、再路由、再幂等」。这不是谁好谁坏的评判而是项目长到这里自然会发生的重构——先把链路跑通再补信任和调度否则一开始就会被密钥、签名和队列参数绑死反而动不了。如果后面继续做 OAuth、做仓库连接状态我会尽量让新能力挂在这条已经收紧的链路上而不是另起一套入口。这样我自己读代码时也能分清哪一段是在回答这是不是 GitHub哪一段是在回答同一消息算不算两遍哪一段是在回答算完之后别把自己堵死。前面这些改动单看都像加校验、加表、改 compose叠在一起的意义其实很实在Webhook 从能接到变成接得对、接得稳审查从跟着 HTTP 一起堵变成能排队、能超时、能调并发后面无论是接 OAuth、挂多仓库还是做评论发布成功率都不必再回头拆掉一整个入口。中期这一段的价值对我来说就是把最费返工的地基压实了后面写业务接口时少了很多万一其实是伪造请求的心虚。对于代码审查助手来说开发者打开 PR希望尽快得到有据、可追踪、可确认的审查意见而不是多一个容易误报、重复投递、后台卡死的玩具接口。把 Webhook 验真、幂等和 Worker 调度收紧之后同一条 PR 事件只会稳定地变成一条可查询的审查任务审查流水线才有机会在后台慢慢拉 diff、跑分析、生成草稿评论——助手才算站在能帮上忙这一侧而不是给团队添噪音。