当前位置：首页 > news >正文

AI落地为什么失败？—95%的企业AI项目死在workflow上

news 2026/6/18 6:49:03

来源：BG2 Pod / YouTube
嘉宾：Ali Ghodsi（Databricks CEO）、Arvind Jain（Glean CEO）
主持人：Apoorv Agrawal（Altimeter合伙人）
总时长：45分00秒
博客日期：2025/12/23

核心摘要

Databricks CEO Ali Ghodsi与Glean CEO Arvind Jain在BG2 Pod上坦诚拆解企业AI落地的真实困境与突破路径。核心论断：95%的AI项目失败不是因为技术不行，而是因为组织没有把AI嵌入workflow。LLM正在快速commoditize——像加油站一样 interchangeable，真正的壁垒是专有数据、workflow integration和agentic系统。两人分享了RBC（自动化的金融合规审查）、Merck（药物发现文献综述）、7-Eleven（库存预测）的真实落地案例，也坦承了自己公司内部的失败尝试（Glean的AI优先级排序项目、Databricks的custom model尝试）。关键洞察：企业AI的价值捕获在app layer而非model layer；RPA解决了结构化数据的自动化，而生成式AI解决了非结构化数据的自动化，两者的结合才是enterprise automation的完整图景。

一、Consumer AI vs. Enterprise Reality（01:00-02:15）

1.1 消费AI与企业AI的根本差异

消费AI：一个model（如ChatGPT）服务10亿用户，success path清晰
企业AI：同一个model需要在数千个不同workflow中work，每个workflow的context不同
核心差异：
“Consumer AI is about one model serving a billion users. Enterprise AI is about one model serving a billion workflows.[消费级人工智能的核心在于一个模型服务十亿用户；企业级人工智能的核心在于一个模型服务十亿个工作流]”
企业AI的复杂性：security、compliance、governance、data privacy、legacy system integration——这些都是消费AI不需要面对的

💡 思考点："one model serving a billion workflows"这个比喻精准地捕捉了企业AI的核心挑战。消费AI的scaling是horizontal（横向扩展用户），企业AI的scaling是vertical（纵向深入每个workflow）。这是否解释了为什么企业AI公司（如Databricks、Glean）的revenue per customer远高于消费AI公司？

二、Why 95% of AI Projects Fail（02:15-04:15）

2.1 失败率的数据与真实原因

MIT Research：95%的企业生成式AI pilot未能交付可衡量的商业价值measurable business value
只有5%的AI pilot program实现了快速的revenue acceleration
但Arvind Jain的解读：
“You hear these 95% of projects fail. That’s actually what you want. When you’re actually experimenting with new technology, if all of your projects are failing, that means you’re not trying enough.[你常听说95%的项目都会失败。其实这正是你所希望的。当你真正尝试新技术时，如果所有项目都失败了，那说明你还不够努力]”

2.2 真正的失败原因：不是技术，是组织

不是model不够好——GPT-4/Claude已经在大多数task上足够好
不是data不够多——企业有tons of data
真正的原因：没有把AI embed到workflow中
“It’s not just you can just unleash the agents, and it just works. Making AI effective within an organization is a complex engineering challenge that requires deep integration, careful testing, and strong teams[并不仅仅是放手让这些AI Agent去运作，它们就能自动生效。要在组织内部有效运用人工智能，是一项复杂的工程挑战，需要深度集成、仔细测试以及强大的团队].”
Ali Ghodsi的补充：
- 很多企业把AI当成"plug-and-play"——buy一个LLM API，expect magic
- 实际上需要：data pipeline[数据管道]、context management[上下文管理]、evaluation framework[评估框架]、human-in-the-loop[人机协同]、continuous iteration[持续迭代]

2.3 成功的5%做对了什么

他们把AI嵌入到existing workflow中，而不是创建new workflow
他们 focused onone specific use case，perfected it，then expanded[专注于一个特定的用例，将其完善，然后进行扩展]
他们 invested indata infrastructurebefore AI[在人工智能出现之前就投资了数据基础设施]

三、RBC, Merck, and 7-Eleven Use Cases（04:15-06:45）

3.1 RBC（加拿大皇家银行）：金融合规审查自动化

Problem：合规团队需要review thousands of financial documents daily
Solution：AI agent自动read、classify、flag异常文档
Result：
- 处理时间从4小时降至15分钟
- 准确率从人工的85%提升至97%
- 人类reviewer从"reader"变为"validator"
Key insight：AI没有取代人类，而是改变了人类的角色

3.2 Merck（默克）：药物发现文献综述

Problem：药物发现团队需要review millions of scientific papers
Solution：AI agent自动summarize、extract key findings、identify patterns
Result：
- Literature review时间从3个月降至2周
- 发现了人类researcher遗漏的3个潜在drug interactions
Key insight：AI在"read everything"上比人类强，但在"judge what matters"上仍需要人类

3.3 7-Eleven：库存预测

Problem：8万+ SKU的库存管理，过度库存和缺货同时存在
Solution：AI agent分析sales data[销售数据]、weather[天气]、local events[当地活动]、supplier lead times[供应商交货期]
Result：
- 库存周转率提升23%
- 缺货率下降40%
- 过期损耗减少15%
Key insight：AI的价值在于integrating multiple data sources that humans can’t process simultaneously[整合多种人类无法同时处理的数据源]

💡 思考点：三个案例的共同点是什么？不是"AI replaced humans"，而是"AI changed what humans do"。RBC的reviewer从reader变validator，Merck的researcher从reader变strategist，7-Eleven的manager从data cruncher变decision maker。这是否意味着企业AI的正确narrative不是"automation(自动化)“而是"augmentation（增长）”？

四、What Actually Makes AI Work（06:45-08:45）

4.1 三大成功要素

要素	说明	为什么重要
Proprietary Data	企业独有的数据——客户记录、交易历史、内部文档	LLM是commodity，但your data is not
Workflow Integration	AI embed到existing workflow中，不创造new workflow	用户不需要change behavior
Agentic Systems	AI能自主take action，不只是generate text	从"assistant"到"executor"

4.2 Ali Ghodsi的框架

Data is the moat：
“LLMs are like gas stations. They’re everywhere, they’re interchangeable. Your proprietary data is your oil well.[大型语言模型就像加油站。它们无处不在，彼此可互换。而你的专有数据就是你的油井]”
Workflow is the castle：没有workflow integration，AI只是isolated tool，不是system
Agents are the army：agents让AI从"suggest"变为"do"

五、Failed AI Bets at Databricks & Glean（08:45-11:00）

5.1 Glean的失败：AI优先级排序

Project：让AI自动识别每个员工的top weekly priorities，汇总给leadership
Why it seemed easy：“It has all the context inside the company to make it happen[公司内部具备实现这一目标所需的一切条件]”
Why it failed：
- Priority是主观的——what’s “important” varies by person, by week, by context
- AI无法捕捉隐性知识capture implicit knowledge（“我知道这个重要，但无法清晰表达为什么”）
- Leadership的expectation与AI的capability存在gap
Lesson：
“It actually takes much longer than you know to actually generate success.[实际上，要取得成功，所需的时间远比你想象的要长得多]”

5.2 Glean的另一个失败：Custom AI Model

Project：为特定product function构建custom AI model
Why it failed：
- 微调成本高于预期
- 维护成本太高
- 基础模型（GPT-4/Claude）的进步速度超过custom model的迭代速度
Lesson：return to foundation models——less tailored, but more reliable and easier to implement[回归基础模型——虽然定制化程度较低，但更可靠且更易于实现]

5.3 Databricks的失败：过早投入Agentic

Project：2024年初推出autonomous data agent
Why it failed：
- 企业客户not ready——governance、trust、audit trail都不成熟
- Agent的hallucination在enterprise context中cost太高
- 客户需要human-in-the-loop，not full autonomy
Lesson：enterprise AI需要先证明可靠性prove reliability，再赋予自主权grant autonomy

💡 思考点：两个CEO坦承失败，这本身就是宝贵的signal。很多企业AI的失败不是因为"AI不够好"，而是因为"组织没准备好"或"use case选错了"。Glean的priority排序失败揭示了AI在subjective judgment（主观判断）上的根本性限制——这正是人类judgment的价值所在。

六、RPA vs. Generative AI（11:00-14:15）

6.1 RPA（机器人流程自动化）的局限

RPA解决的问题：结构化数据的自动化
- 固定规则、固定input/output、deterministic[确定性的]
- 例如：从A系统copy data到B系统、form filling
RPA的bottleneck：
- 每次UI变化都需要重新configure
- 无法处理非结构化数据unstructured data（email、document、conversation）
- 维护成本随流程数量线性增长

6.2 生成式AI的互补性

生成式AI解决的问题：非结构化数据的自动化
- Email summarization、document extraction、conversation analysis[电子邮件摘要、文档信息提取、对话分析]
- 能理解context、handle variability[上下文、处理变异性]
两者结合才是完整图景：
“RPA handles the structured, repetitive tasks. GenAI handles the unstructured, cognitive tasks. Together, they’re the full stack of enterprise automation.[RPA 负责处理结构化、重复性的任务。生成式人工智能（GenAI）则负责处理非结构化、需要认知能力的任务。二者结合，构成了企业自动化解决方案的完整体系。]”

6.3 Ali Ghodsi的预测

RPA公司（UiPath、Automation Anywhere）会被"AI-native workflow automation"取代
不是RPA技术本身被淘汰，而是RPA作为独立category会disappear——所有workflow automation都会incorporate AI
Timeline：2-3 years

七、Advice for CIOs Planning AI Budgets（14:15-16:00）

7.1 Arvind Jain给CIO的建议

Rule #1：Start with data infrastructure[从数据基础设施开始]
- 如果data is messy, AI will be messy
- Invest in data cleaning、data governance、data accessibility first[首先应投资于数据清洗、数据治理和数据可访问性]
Rule #2：Pick one use case，make it work，then expand[选择一个用例，先让它正常运行，然后再进行扩展]
- Don’t try to “AI everything” at once[不要试图一下子把“一切都交给AI”]
- Success breeds success——one win builds organizational confidence[成功会带来更多成功——一次成功就能增强组织的信心]
Rule #3：Measure outcome, not output[评估结果，而非产出]
- Don’t measure “how many AI models deployed”[不要评估“已部署的人工智能模型数量”]
- Measure “how much time saved”、“how much revenue increased”、“how many errors reduced”[评估“节省了多少时间”、“收入增加了多少”、“减少了多少错误”]

7.2 Ali Ghodsi的补充

Budget split[预算分配]建议：
- 60% data infrastructure[60% 数据基础设施]
- 20% one use case perfection[20% 某个用例的完善]
- 20% experimentation[20% 实验]
Most common mistake：把80% budget给AI models，20%给data——应该反过来

8、AI CapEx and the Revenue Math（16:00-18:00）

8.1 AI投资的回报周期

Year 1：通常是净负债net negative——infrastructure investment、training、failure[基础设施投资、培训、失败]
Year 2：收支平衡或略有盈余break-even或slightly positive
Year 3+：复利效应compounding returns——each new use case cheaper than the last[每个新用例的成本都比上一个更低]
Ali Ghodsi的比喻：
“AI investment is like building a factory. You don’t expect ROI in month one. You expect ROI when the factory is running at full capacity.”

8.2 收入数学

Databricks的数据：
- AI product revenue：$1B+（run-rate）
- 客户采用AI后，平台粘性增长3倍以上platform stickiness increases 3x
- AI customers have 2x higher NRR（净收入留存率Net Revenue Retention）than non-AI customers
关键 insight：AI不是cost center，是retention driver

九、The Three Camps of AI（18:00-21:00）

9.1 企业AI的价值分层

层级	代表公司	价值捕获	持久性
模型层	OpenAI, Anthropic, Google	当前水平较高，正在压缩	较低——正迅速商品化
基础设施层	Databricks, Snowflake, AWS	当前水平中等，正在增长	中等——平台锁定
应用层	Glean, Salesforce, Vertical SaaS	当前水平较低，呈爆发式增长	较高——工作流锁定

9.2 为什么App层最终会捕获最多价值

Arvind Jain的论点：
“The value in enterprise AI accrues to the app layer. Models are commodities. Infra is necessary but not sufficient. The company that owns the workflow owns the customer.”
类比：
- Model layer = Intel（芯片）——important but not where value accrues[很重要，但并非价值产生之处]
- Infra layer = Windows（操作系统）——necessary platform[必要的平台]
- App layer = Office（应用）——where users actually work and value is created[用户实际工作并创造价值的地方]

9.3 Ali Ghodsi的修正

同意App层价值最高，但认为Infra层（如Databricks）是App层的enabler
Databricks的策略：成为"platform for AI apps"——let vertical SaaS companies build on Databricks[让垂直领域 SaaS 公司基于Databricks构建应用]
双赢：Databricks gets platform revenue，vertical SaaS gets AI capability without building infra

十、Making AI Useful Inside Enterprises（21:00-24:30）

10.1 Workflow Integration[工作流集成]的深层含义

不是"add AI button"：很多企业 mistake AI integration as “add a chatbot to our app”
真正的integration：AI invisible地嵌入到every step of workflow
- Email：AI auto-summarize、auto-draft、auto-schedule[AI自动摘要、自动起草、自动管理]
- CRM：AI auto-log、auto-prioritize、auto-suggest next action[AI自动记录、自动优先级排序、自动建议下一步行动]
- Finance：AI auto-reconcile、auto-flag anomaly、auto-generate report[AI自动对账、自动标记异常、自动生成报告]
Goal：用户不需要"use AI"——AI只是make their existing work better

10.2 Glean的实践经验

Glean的产品：enterprise search + AI assistant[企业搜索 + AI 助手]
Insight from deployment：
- 最成功的客户不是那些"aggressively use AI features"的
- 而是那些"AI quietly improves their existing workflow"的
Adoption metric：不是"how many people click the AI button"
- 而是"how much time saved per user per week"

十一、Why Apps Capture the Value（24:30-30:00）

11.1 AI价值的终极流向

Arvind Jain的核心论点：
“In the long run, all the value in AI flows to the application layer. Models become commodities. Infrastructure becomes invisible. What remains is the app that owns the workflow.[从长远来看，AI的所有价值都将流向应用层。模型将变成大宗商品，基础设施将变得无形。最终留下的，是掌控工作流的应用程序]”
证据：
- PC era[PC时代]：value flowed to Microsoft Office，not Intel or Windows[价值流向了微软Office，而非英特尔或Windows]
- Mobile era[移动互联网时代]：value flowed to Uber/Airbnb/WeChat，not iOS or ARM[价值流向了Uber/Airbnb/WeChat，而非iOS或ARM]
- AI era[Ai时代]：value will flow to workflow apps，not LLM or cloud[价值将流向工作流应用，而非大语言模型或云服务]

11.2 Enterprise AI的"最后一公里"问题

Model capability ≠ Business value
从model到value之间需要：
1. Data integration（连接企业数据）
2. Workflow embedding（嵌入工作流）
3. Trust building（建立信任）
4. Change management（改变管理）
App layer公司（如Glean、Salesforce）已经解决了#3和#4
Infra layer公司（如Databricks）解决了#1和#2
未来：两者融合converge——infra companies build apps，app companies build infra[基础设施公司开发应用，应用公司构建基础设施]

十二、The Future of UI, Voice, and Data Entry（30:00-37:30）

12.1 UI的范式转移

当前：GUI（Graphical User Interface图形用户界面）——click、type、scroll [点击、输入、滚动]
未来：LUI（Language User Interface语言用户界面）——talk、ask、command [说话、提问、下达指令]
Arvind Jain的预测：
“In 5 years, 50% of enterprise software interactions will be through natural language.”
But：LUI不会completely replace GUI——复杂任务（如data visualization）仍需要visual interface[可视化界面]

12.2 语音交互的企业场景

最适合：hands-free场景——warehouse、factory、field service
最不适合：quiet office environment（隐私问题）
Key barrier：enterprise security——voice data is sensitive

12.3 数据输入的未来

当前：human types data into system[由人工将数据录入系统]
未来：AI auto-extracts data from conversation、document、activity [AI 能从对话、文档和活动记录中自动提取数据]
Implication：“data entry” as a job category will disappear
“The concept of ‘entering data’ will seem as quaint as ‘typing memos’ seems today.”

十三、Rapid Fire: Winners, Bubbles, Long/Short（37:30-45:00）

13.1 赢家预测

Ali Ghodsi：Databricks（😄）+ 医疗/法律领域的垂直AI应用
Arvind Jain：Glean（😄）+ 在受监管行业中掌握工作流的公司

13.2 泡沫判断

Ali Ghodsi：AI infra valuations are in a bubble——$100B+ valuations for companies with <$5B revenue[AI基础设施估值处于泡沫中——营收不足50亿美元的公司估值却超过1000亿美元]
“The infra layer is overvalued. The app layer is undervalued. That’s the trade.”[基础设施层被高估了，应用层被低估了。这就是投资逻辑]
Arvind Jain：同意——模型层泡沫尤为严重model layer especially bubbly
- OpenAI $300B valuation on $5B revenue = 60x revenue multiple[OpenAI 营收50亿美元，估值3000亿美元 = 60倍营收倍数]
- 历史先例：Cisco at peak of dot-com was 50x revenue——then crashed 80%[Cisco在互联网泡沫巅峰时期的估值为50倍营收——随后暴跌80%]

13.3 Long/Short(看多/看空)

标的	判断	理由
OpenAI	空Short（Arvind）/ 中性Neutral（Ali）	模型商品化 + 高估值
Databricks	多Long（Ali😄）	AI 应用平台 + 数据护城河
Glean	多Long（Arvind😄）	工作流所有权 + 企业信任
UiPath	空Short（both）	RPA 正受到原生 AI 自动化冲击
Vertical AI Apps	多Long（both）	自有工作流 + 领域专业知识

核心观点总结

关键数据

95%：企业生成式AI pilot的失败率（MIT Research）
5%：实现快速revenue acceleration的AI pilot比例
$1B+：Databricks AI product revenue run-rate
3x：AI客户的platform stickiness提升倍数
2x：AI客户的NRR（Net Revenue Retention）高于非AI客户
50%：Arvind Jain预测的5年后natural language交互占比
60x：OpenAI估值/收入倍数（$300B / $5B）

核心判断

95%失败率不是bug是feature——高失败率说明企业在积极探索边界
真正的失败原因不是技术，是组织——没有把AI embed到workflow中
LLM正在commoditize——像加油站一样interchangeable，壁垒在数据
价值最终流向app layer——model和infra是necessary but not sufficient
RPA + 生成式AI = 完整自动化图景——结构化+非结构化数据的全面覆盖
AI不是cost center是retention driver——AI客户的stickiness和NRR显著更高
infra layer估值泡沫化，app layer被低估——$100B+ infra valuations vs <$10B app valuations
CIO应该把60%预算给data infrastructure——不是给AI models