ai学习之在云端训练一个模型

张

张建站

2026/4/29 6:26:52

10分钟阅读

平台魔塔https://www.modelscope.cn/在上面创建一个notebook配置环境pip install transformers4.57 qwen_vl_utils0.0.14 pip install ms-swift3.9.1 pip install modelscope下载模型modelscope download --model Qwen/Qwen3-VL-2B-Instruct --local_dir /mnt/workspace/models/Qwen/Qwen3-VL-2B-Instruct上传数据集from modelscope.hub.api import HubApi # 1. 登录 api HubApi() api.login(命令牌) # 2. 上传数据集 # repo_id: 你的用户名/数据集仓库名 # folder_path: 本地包含图片、json、metadata.jsonl 的文件夹路径注意这里参数名也变了 # repo_type: 必须指定为 dataset否则默认会上传为模型 api.upload_folder( repo_idBECAUSEACC/rock, folder_path./, repo_typedataset, commit_messageupload dataset folder to repo )notebook上下载数据集modelscope download --dataset BECAUSEACC/rock --local_dir ./rock_data开始训练CUDA_VISIBLE_DEVICES0 swift sft \ --model /mnt/workspace/models/Qwen/Qwen3-VL-2B-Instruct \ --train_type lora \ --dataset ./data_swift/train_messages.jsonl \ --val_dataset ./data_swift/val_messages.jsonl \ --torch_dtype bfloat16 \ --num_train_epochs 3 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --learning_rate 5e-5 \ --lora_rank 16 \ --lora_alpha 32 \ --target_modules q_proj k_proj v_proj o_proj gate_proj up_proj down_proj \ --gradient_accumulation_steps 8 \ --eval_steps 300 \ --save_steps 300 \ --save_total_limit 2 \ --logging_steps 20 \ --output_dir ./output_qwen_vl_lora_v2 \ --gradient_checkpointing true \ --quant_method bnb \ --quant_bits 4训练完成后打包下载swift export \ --model /mnt/workspace/models/Qwen/Qwen3-VL-2B-Instruct \ --adapters ./output_qwen_vl_lora/v0-20260428-172313/checkpoint-38 \ --merge_lora true \ --output_dir ./qwen_vl_final_package测试模型import torch from transformers import AutoModelForImageTextToText, AutoProcessor from PIL import Image model_path rB:\Pycharm_PROJECT\picture\checkpoint-38-merged # 改为你的实际路径 image_path rB:\Pycharm_PROJECT\picture\my_images\train\010_olivinite\010_olivinite_3.jpg # 改为你的图片路径 print(正在加载模型...) model AutoModelForImageTextToText.from_pretrained( model_path, dtypetorch.bfloat16, # 注意改成 dtype消除警告 device_mapauto, trust_remote_codeTrue ) processor AutoProcessor.from_pretrained(model_path, trust_remote_codeTrue) print(模型加载完成) image Image.open(image_path) query 这张图片是什么岩石 messages [{role: user, content: [ {type: image, image: image}, {type: text, text: query} ]}] # ---------- 修正部分 ---------- # 方法1先获得文本模板再构造输入 text processor.apply_chat_template(messages, tokenizeFalse, add_generation_promptTrue) inputs processor(texttext, images[image], return_tensorspt).to(model.device) # ----------------------------- outputs model.generate(**inputs, max_new_tokens512, do_sampleTrue, temperature0.1) response processor.decode(outputs[0][inputs[input_ids].shape[1]:], skip_special_tokensTrue) print(f用户: {query}) print(f模型: {response})

发散创新：基于共享内存的高性能进程间通信机制实战解析在现代多核系统中，高效、低延迟的进程间通信（IPC）是构建

发散创新：基于共享内存的高性能进程间通信机制实战解析在现代多核系统中，高效、低延迟的进程间通信（IPC） 是构建高性能服务的关键。传统方式如管道、消息队列虽然稳定，但在高吞吐场景下性能受限。而共享内存&#xf…...

2026/4/29 6:18:54 阅读更多 →

C++多态编程：从原理到实战

一、多态核心概念1. 什么是多态？同一个行为，不同对象有不同实现。父类引用 / 指针指向子类对象，调用函数时，执行子类重写的版本。2. 多态价值降低耦合，代码高扩展父类统一接口，子类自由实现新增子类无需修…...

2026/4/29 6:17:24 阅读更多 →

男孩女孩培养学习兴趣有区别吗？脑科学真相+分龄专属学习软件实测推荐

🔔 专栏：育儿教育干货实操 | 分龄启蒙兴趣培养避坑指南做教育博主这么多年，后台被问得最多的高频问题，没有之一：“博主，养男孩和养女孩，培养学习兴趣真的有区别吗？”“同款学习APP&a…...

2026/4/29 6:08:28 阅读更多 →

PowerShell脚本编译终极指南：如何用Win-PS2EXE轻松打包脚本为EXE文件

PowerShell脚本编译终极指南：如何用Win-PS2EXE轻松打包脚本为EXE文件【免费下载链接】PS2EXE Module to compile powershell scripts to executables 项目地址: https://gitcode.com/gh_mirrors/ps/PS2EXE 还在为PowerShell脚本分发而烦恼吗？每次…...

2026/4/28 9:20:28 阅读更多 →