当前位置：首页 > news >正文

Obsidian与AI知识管理

news 2026/5/28 20:37:11

Obsidian + AI：用大模型构建你的第二大脑知识管理系统

🧠 本文手把手教你将Obsidian笔记与大模型深度集成，构建一个能自动分类、智能检索、关联推荐的个人知识管理系统。包含完整的Python插件开发、向量检索、RAG问答等实战代码。

前言

我用了3年Obsidian，积累了2000+笔记。但随着笔记越来越多，一个核心问题越来越严重：笔记写了找不到，找了记不住，记住了用不上。

直到我把大模型接入Obsidian，这些问题才真正被解决。本文记录了我从零构建这个系统的完整过程，包括踩过的所有坑。

一、系统架构总览

┌─────────────────────────────────────────────────────────────┐ │ Obsidian + AI 知识管理系统 │ ├─────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ Obsidian │◄──►│ Obsidian插件 │◄──►│ Python后端服务 │ │ │ │ Vault │ │ (TypeScript) │ │ (FastAPI) │ │ │ └──────────┘ └──────────────┘ └────────┬─────────┘ │ │ │ │ │ ┌───────────────────────────┼────────┐ │ │ │ ↓ │ │ │ ┌─────┴─────┐ ┌──────────┴──────┐ │ │ │ │ ChromaDB │ │ LLM API │ │ │ │ │ 向量数据库 │ │ (Qwen/GLM/...) │ │ │ │ └───────────┘ └─────────────────┘ │ │ │ │ │ └──────────────────────────────────────────────────────────┘ │

核心功能

智能分类：新笔记自动打标签、归类到合适文件夹
语义搜索：用自然语言搜索笔记，不依赖关键词匹配
关联推荐：写笔记时自动推荐相关笔记
知识问答：基于你的笔记库回答问题（RAG）
自动摘要：长笔记自动生成摘要和要点

二、搭建Python后端服务

2.1 项目结构

obsidian-ai-backend/ ├── app/ │ ├── __init__.py │ ├── main.py # FastAPI入口 │ ├── config.py # 配置管理 │ ├── services/ │ │ ├── __init__.py │ │ ├── embedding.py # 文本向量化服务 │ │ ├── vector_store.py # 向量数据库服务 │ │ ├── llm.py # LLM服务 │ │ ├── indexer.py # 笔记索引服务 │ │ └── search.py # 搜索服务 │ ├── models/ │ │ ├── __init__.py │ │ └── schemas.py # 数据模型 │ └── utils/ │ ├── __init__.py │ ├── markdown.py # Markdown解析工具 │ └── text_splitter.py # 文本分割工具 ├── requirements.txt └── docker-compose.yml

2.2 核心依赖

# requirements.txt fastapi==0.115.0 uvicorn==0.30.0 chromadb==0.5.0 openai==1.40.0 sentence-transformers==3.0.0 tiktoken==0.7.0 python-frontmatter==1.1.0 watchfiles==0.23.0 pydantic==2.8.0

2.3 配置管理

# app/config.pyfrompydantic_settingsimportBaseSettingsfrompathlibimportPathclassSettings(BaseSettings):# Obsidian Vault路径vault_path:str="/mnt/c/Users/Erpan/Documents/ObsidianVault"# 向量数据库配置chroma_persist_dir:str="./chroma_data"collection_name:str="obsidian_notes"# Embedding模型配置embedding_model:str="BAAI/bge-small-zh-v1.5"# 中文效果好，体积小embedding_device:str="cpu"# 没有GPU就用cpu# LLM配置llm_base_url:str="https://api.siliconflow.cn/v1"llm_api_key:str=""# 从环境变量读取llm_model:str="Qwen/Qwen2.5-7B-Instruct"# 文本分割配置chunk_size:int=512chunk_overlap:int=64# 文件监控watch_enabled:bool=TrueclassConfig:env_file=".env"env_prefix="OBSIDIAN_AI_"settings=Settings()

⚠️ 踩坑1：Embedding模型选择

# ❌ 错误：直接用OpenAI的text-embedding-ada-002# 问题：延迟高、成本高、中文效果一般# ✅ 正确：用本地Embedding模型# 推荐选择：embedding_options={"bge-small-zh-v1.5":{"size":"90MB","dimension":512,"chinese_quality":"★★★★","speed":"极快","推荐场景":"CPU环境，笔记量<5000",},"bge-base-zh-v1.5":{"size":"400MB","dimension":768,"chinese_quality":"★★★★★","speed":"较快","推荐场景":"GPU环境，笔记量>5000",},"bge-m3":{"size":"2.2GB","dimension":1024,"chinese_quality":"★★★★★","speed":"一般","推荐场景":"高质量需求，多语言混合",},}

2.4 文本分割服务

# app/utils/text_splitter.pyimportrefromtypingimportList,Dictfromdataclassesimportdataclass@dataclassclassTextChunk:content:strmetadata:dictchunk_index:intclassMarkdownTextSplitter:"""专门针对Markdown的智能文本分割器"""def__init__(self,chunk_size:int=512,chunk_overlap:int=64):self.chunk_size=chunk_size self.chunk_overlap=chunk_overlapdefsplit(self,text:str,metadata:dict)->List[TextChunk]:""" 智能分割Markdown文本： 1. 首先按标题分割成sections 2. 如果section太长，再按段落分割 3. 如果段落还是太长，按句子分割 """# 提取frontmatter中的元数据frontmatter,body=self._extract_frontmatter(text)metadata.update(frontmatter)# 按标题分割sections=self._split_by_headers(body)chunks=[]forsectioninsections:section_text=section["content"].strip()ifnotsection_text:continuesection_metadata={**metadata,"section_title":section["title"],"section_level":section["level"],}# 如果section足够短，直接作为一个chunkiflen(section_text)<=self.chunk_size:chunks.append(TextChunk(content=section_text,metadata=section_metadata,chunk_index=len(chunks),))else:# 按段落进一步分割paragraphs=self._split_by_paragraphs(section_text)current_chunk=""forparainparagraphs:iflen(current_chunk)+len(para)<=self.chunk_size:current_chunk+=para+"\n\n"else:ifcurrent_chunk.strip():chunks.append(TextChunk(content=current_chunk.strip(),metadata=section_metadata,chunk_index=len(chunks),))current_chunk=para+"\n\n"ifcurrent_chunk.strip():chunks.append(TextChunk(content=current_chunk.strip(),metadata=section_metadata,chunk_index=len(chunks),))returnchunksdef_extract_frontmatter(self,text:str)->tuple:"""提取YAML frontmatter"""importfrontmattertry:post=frontmatter.loads(text)returndict(post.metadata),post.contentexceptException:return{},textdef_split_by_headers(self,text:str)->List[Dict]:"""按Markdown标题分割"""header_pattern=re.compile(r'^(#{1,6})\s+(.+)$',re.MULTILINE)sections=[]last_end=0last_title="untitled"last_level=0formatchinheader_pattern.finditer(text):iflast_end>0:content=text[last_end:match.start()]sections.append({"title":last_title,"level":last_level,"content":content,})last_title=match.group(2)last_level=len(match.group(1))last_end=match.end()# 最后一个sectionsections.append({"title":last_title,"level":last_level,"content":text[last_end:],})returnsectionsdef_split_by_paragraphs(self,text:str)->List[str]:"""按段落分割"""paragraphs=re.split(r'\n\s*\n',text)return[p.strip()forpinparagraphsifp.strip()]

2.5 向量数据库服务

# app/services/vector_store.pyimportchromadbfromchromadb.configimportSettingsasChromaSettingsfromtypingimportList,Dict,OptionalimporthashlibimporttimeclassVectorStoreService:"""基于ChromaDB的向量存储服务"""</