异步通信指令详解TPUT_ASYNC / TGET_ASYNC / BuildAsyncSession【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isaTPUT_ASYNC — 异步远程写启动 GM→GM DMA 传输立即返回AsyncEvent。template DmaEngine engine DmaEngine::SDMA, typename GlobalDstData, typename GlobalSrcData, typename... WaitEvents AsyncEvent TPUT_ASYNC(GlobalDstData dst, GlobalSrcData src, const AsyncSession session, WaitEvents... events);TGET_ASYNC — 异步远程读启动远端 GM→本地 GM DMA 传输。template DmaEngine engine DmaEngine::SDMA, typename GlobalDstData, typename GlobalSrcData, typename... WaitEvents AsyncEvent TGET_ASYNC(GlobalDstData dst, GlobalSrcData src, const AsyncSession session, WaitEvents... events);BuildAsyncSession — 构建异步会话SDMA 构建默认template DmaEngine engine DmaEngine::SDMA, typename ScratchTile bool BuildAsyncSession(ScratchTile scratchTile, __gm__ uint8_t *workspace, AsyncSession session, uint32_t syncId 0, const sdma::SdmaBaseConfig baseConfig {sdma::kDefaultSdmaBlockBytes, 0, 1}, uint32_t channelGroupIdx sdma::kAutoChannelGroupIdx);参数说明scratchTile用于 SDMA 控制元数据的 UB scratch tile非数据负载推荐TileTileType::Vec, uint8_t, 1, comm::sdma::UB_ALIGN_SIZE256Bworkspace由 Host 侧SdmaWorkspaceManager分配的 GM 指针syncIdMTE3/MTE2 管道同步事件 ID0-7避免与 kernel 内其他管道屏障冲突baseConfig{block_bytes, comm_block_offset, queue_num}默认适用于单队列场景channelGroupIdxSDMA 通道组索引默认使用get_block_idx()映射URMA 构建仅 Ascend950 / NPU_ARCH 3510bool BuildAsyncSession(__gm__ uint8_t *workspace, uint32_t destRankId, AsyncSession session);异步约束仅支持扁平连续的逻辑一维 tensor非一维返回无效 eventSDMA workspace 必须由 Host 侧SdmaWorkspaceManager分配URMA workspace 必须由 Host 侧UrmaWorkspaceManager分配URMA 需要大页内存ACL_MEM_MALLOC_HUGE_ONLY小页分配导致注册失败scratchTile仅用于控制元数据不是数据暂存缓冲完成语义Quiet 语义event.Wait(session)阻塞直到自上次 Wait 以来所有已发出的异步操作全部完成多次异步调用后只需对最后一个AsyncEvent调用一次Wait类似 shmem 的 quiet 语义完整示例// 构建会话 using ScratchTile TileTileType::Vec, uint8_t, 1, comm::sdma::UB_ALIGN_SIZE; ScratchTile scratchTile; TASSIGN(scratchTile, 0x0); comm::AsyncSession session; if (!comm::BuildAsyncSessioncomm::DmaEngine::SDMA(scratchTile, sdmaWorkspace, session)) { return; } // 批量传输 一次 Wait comm::AsyncEvent lastEvent; for (int rank 0; rank nranks; rank) { GT dstG(remoteDst rank * size, shape, stride); lastEvent comm::TPUT_ASYNC(dstG, srcG, session); } (void)lastEvent.Wait(session); // 等待所有 pending 操作完成【免费下载链接】pto-isaParallel Tile Operation (PTO) is a virtual instruction set architecture designed by Ascend CANN, focusing on tile-level operations. This repository offers high-performance, cross-platform tile operations across Ascend platforms.项目地址: https://gitcode.com/cann/pto-isa创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考