当前位置: 首页 > news >正文

多智能体搜索算法Python实现 CS188 Proj2 学习笔记

更好的阅读体验Q1.Reflex AgentProj2的开头是一个完成评估函数的问题评估函数的思想非常类似与启发式的思想。相较与Proj1不同的是该过程添加了对抗智能体还有ScaredTime的设定。完成评估函数的时候首先考虑就是评估函数的组成要素其中可以大致表示为总分 游戏自带分数 食物相关奖励 scared鬼奖励 - stop惩罚 - 危险鬼惩罚原评估函数在Reflex Agent类中的scoreEvaluationFunction中实现其只是返回了当前状态的分数缺少很多必要的组成元素可以用下列指令在shell中运行原评估函数的gui版本python pacman.py-pReflexAgent-ltestClassic很容易发现的是pacman在许多情况下采取了action集中的stop动作。这种行为及其拉低效率所以把当采取stop这一步的影响也要算入评估函数的表达式中代码实现defevaluationFunction(self,currentGameState:GameState,action): Design a better evaluation function here. The evaluation function takes in the current and proposed successor GameStates (pacman.py) and returns a number, where higher numbers are better. The code below extracts some useful information from the state, like the remaining food (newFood) and Pacman position after moving (newPos). newScaredTimes holds the number of moves that each ghost will remain scared because of Pacman having eaten a power pellet. Print out these variables to see what youre getting, then combine them to create a masterful evaluation function. # Useful information you can extract from a GameState (pacman.py)successorGameStatecurrentGameState.generatePacmanSuccessor(action)newPossuccessorGameState.getPacmanPosition()newFoodsuccessorGameState.getFood()newGhostStatessuccessorGameState.getGhostStates()newScaredTimes[ghostState.scaredTimerforghostStateinnewGhostStates]*** YOUR CODE HERE ***# print(Action: , action)# print(Pacmans position: , newPos)# print(Food count: , newFood.count())# print(Ghost positions: , [g.getPosition() for g in newGhostStates])# print(Scared times: , newScaredTimes)# print(- * 30)scoresuccessorGameState.getScore()ifactionDirections.STOP:score-10foodListnewFood.asList()iffoodList:minFoodDistmin(manhattanDistance(newPos,food)forfoodinfoodList)score10.0/minFoodDistforghostStateinnewGhostStates:ghostPosghostState.getPosition()distmanhattanDistance(ghostPos,newPos)ifghostState.scaredTimer0:ifdist0:score5.0/distelse:ifdist1:score-200else:score-5.0/distreturnscore在Q1原文的提示中提及到了尝试数值的倒数而不是数值本身在本阶段的代码实现也有体现Note: As features, try the reciprocal of important values (such as distance to food) rather than just the values themselves.最终结果Record: Win, Win, Win, Win, Win, Win, Win, Win, Win, Win *** PASS: test_cases/q1/grade-agent.test (4 of 4 points) *** 1278.3 average score (2 of 2 points) *** Grading scheme: *** 500: 0 points *** 500: 1 points *** 1000: 2 points *** 10 games not timed out (0 of 0 points) *** Grading scheme: *** 10: fail *** 10: 0 points *** 10 wins (2 of 2 points) *** Grading scheme: *** 1: fail *** 1: 0 points *** 5: 1 points *** 10: 2 points ### Question q1: 4/4 ### Finished at 20:20:52 Provisional grades Question q1: 4/4 ------------------ Total: 4/4在Q1测试中是可以完美拿到满分的但是我运行了shell指令更换其他的大型复杂layout,发现智能体在后期很容易出现反复踱步的问题。经过AI的详细解答后总结原因得知决策规则里有一行关键代码影响了智能体的行动决策过程chosenIndexrandom.choice(bestIndices)到后期之后食物减少的同时鬼的距离也变远两个方向的1/distance几乎一样而且ghost的距离也差不多总体来说多个action的分数会出现一样的情况。上述代码面对这种情况的时候会出现A → B → A → B的决策过程掉入Note4中提及到的局部最优震荡陷阱,每一步都是最优但整体状况上不前进。从根源上讲这和ReflexAgent天生没有长期记忆和规划能力有关系缓解这个问题的解决方式可以是当检测到新位置和当前位置重复连续出现太多次进行扣分操作Q2.Minimax首先可以回忆一下Note5中Minimax的伪代码在实现文件中代码已经提供了可能用到的函数非常全面。需要注意的是minimax树中的叶节点用self.evaluationFunction来评判状态分数代码实现classMinimaxAgent(MultiAgentSearchAgent): Your minimax agent (question 2) defgetAction(self,gameState:GameState): Returns the minimax action from the current gameState using self.depth and self.evaluationFunction. Here are some method calls that might be useful when implementing minimax. gameState.getLegalActions(agentIndex): Returns a list of legal actions for an agent agentIndex0 means Pacman, ghosts are 1 gameState.generateSuccessor(agentIndex, action): Returns the successor game state after an agent takes an action gameState.getNumAgents(): Returns the total number of agents in the game gameState.isWin(): Returns whether or not the game state is a winning state gameState.isLose(): Returns whether or not the game state is a losing state *** YOUR CODE HERE ***defminimax(state,agentIndex,depth):ifstate.isWin()orstate.isLose()ordepthself.depth:returnself.evaluationFunction(state)numAgentsstate.getNumAgents()ifagentIndex0:value-float(inf)foractioninstate.getLegalActions(0):successorstate.generateSuccessor(0,action)valuemax(value,minimax(successor,1,depth))returnvalueelse:valuefloat(inf)nextAgentIndexagentIndex1foractioninstate.getLegalActions(agentIndex):successorstate.generateSuccessor(agentIndex,action)ifnextAgentIndexnumAgents:valuemin(value,minimax(successor,0,depth1))else:valuemin(value,minimax(successor,nextAgentIndex,depth))returnvalue bestScore-float(inf)bestActionNoneforactioningameState.getLegalActions():successorgameState.generateSuccessor(0,action)scoreminimax(successor,1,0)ifscorebestScore:bestScorescore bestActionactionreturnbestAction这里一定要注意depth的逻辑一次搜索是一次pacman和所有ghost的反应即depth为2时涉及pacman和每个ghost移动两次这里我想了一下代码具体运行过程才能想明白假设self.depth是1,也就是pacman和ghost的移动次数都为1。最开始函数调用传入的minimax(successor, 1, 0),到运行到最后一个ghost智能体时则会执行value min(value, minimax(successor, 0, depth 1))再运行则会触发终止条件depth self.depth,返回状态的分数return self.evaluationFunction()结束递归嵌套,所以说最终depth还是为1的Q3.Alpha-Beta Pruning依旧可以回忆一下Note5中Alpha-Beta Pruning的伪代码有个小细节需要注意的是在Proj2的文档中函数的判断条件稍微改了一下如果不按下面的伪代码实现的话最后autograder没法通过6-tied-root.test*** FAIL: test_cases/q3/6-tied-root.test *** Incorrect generated nodes for depth3 *** Student generated nodes: A B max min1 min2 *** Correct generated nodes: A B C max min1 min2 *** Tree: *** max *** / \ *** min1 min2 *** | / \ *** A B C *** 10 10 0代码实现classAlphaBetaAgent(MultiAgentSearchAgent): Your minimax agent with alpha-beta pruning (question 3) defgetAction(self,gameState:GameState): Returns the minimax action using self.depth and self.evaluationFunction *** YOUR CODE HERE ***defalphabeta(state,alpha,beta,depth,agentIndex):ifstate.isWin()orstate.isLose()ordepthself.depth:returnself.evaluationFunction(state)numAgentsstate.getNumAgents()ifagentIndex0:value-float(inf)foractioninstate.getLegalActions(0):successorstate.generateSuccessor(0,action)valuemax(value,alphabeta(successor,alpha,beta,depth,1))ifvaluebeta:returnvalue alphamax(alpha,value)returnvalueelse:valuefloat(inf)nextAgentIndexagentIndex1foractioninstate.getLegalActions(agentIndex):successorstate.generateSuccessor(agentIndex,action)ifnextAgentIndexnumAgents:valuemin(value,alphabeta(successor,alpha,beta,depth1,0))else:valuemin(value,alphabeta(successor,alpha,beta,depth,nextAgentIndex))ifvaluealpha:returnvalue betamin(beta,value)returnvalue bestScore-float(inf)bestActionNonealpha-float(inf)betafloat(inf)foractioningameState.getLegalActions(0):successorgameState.generateSuccessor(0,action)scorealphabeta(successor,alpha,beta,0,1)ifscorebestScore:bestScorescore bestActionaction alphamax(alpha,bestScore)returnbestAction代码实现的整体框架和Minimax基本没有差异唯一需要注意的点就是根节点作为最大化节点也是需要剪枝的所以说一定不能忘了倒数第二行的alpha max(alpha, bestScore)这里还需加强理解一下alpha和beta在各个阶段的具体数值才能深刻理解alpha-beta pruning的运行过程拿下图来举个例子!最开始根节点值为None,alpha -float(inf)且beta float(inf)走最左子树min1开始遍历自己的一个子节点(3)遍历过后更新min1的beta 3之后遍历后两个子节点(12和6)得到min1的值已知min1的值为3,根节点的alpha开始更新执行alpha max(alpha, bestScore)语句此刻根节点的alpha 3遍历中间子树根节点alpha的值传了下来当遍历到中间子树的第一个子节点(2)触发判断语句value alpha就直接退出此次遍历了Q4.Expectimax依旧依旧是可以回顾一下Note6中的伪代码代码实现classExpectimaxAgent(MultiAgentSearchAgent): Your expectimax agent (question 4) defgetAction(self,gameState:GameState): Returns the expectimax action using self.depth and self.evaluationFunction All ghosts should be modeled as choosing uniformly at random from their legal moves. *** YOUR CODE HERE ***defexpectimax(state,agentIndex,depth):ifstate.isWin()orstate.isLose()ordepthself.depth:returnself.evaluationFunction(state)agentNumstate.getNumAgents()ifagentIndex0:value-float(inf)foractioninstate.getLegalActions(0):successorstate.generateSuccessor(0,action)valuemax(value,expectimax(successor,1,depth))else:value0actionsstate.getLegalActions(agentIndex)nextAgentIndexagentIndex1foractioninstate.getLegalActions(agentIndex):successorstate.generateSuccessor(agentIndex,action)probability1/len(actions)ifnextAgentIndexagentNum:valueprobability*expectimax(successor,0,depth1)else:valueprobability*expectimax(successor,nextAgentIndex,depth)returnvalue bestActionNonebestScore-float(inf)foractioningameState.getLegalActions(0):successorgameState.generateSuccessor(0,action)scoreexpectimax(successor,1,0)ifscorebestScore:bestScorescore bestActionactionreturnbestAction util.raiseNotDefined()Q4这个问题和前面几个问题代码基本上都是一个模板直接套就行唯一需要注意的地方是有关probability的代码对手将在getLegalActions中纯随机选择,所以probability的实现是probability 1 / len(actions),原文档如下To simplify your code, assume you will only be running against an adversary which chooses amongst their getLegalActions uniformly at random.原文档还提到了AlphaBetaAgent(以Minimax为基础)和ExpectimaxAgent的运行时候的区别python pacman.py-pAlphaBetaAgent-ltrappedClassic-adepth3-q-n10python pacman.py-pExpectimaxAgent-ltrappedClassic-adepth3-q-n10比较容易观察发现的现象是ExpectimaxAgent可以在多局比赛中赢得几乎一半的数量AlphaBetaAgent总是输。导致这个现象的根本原因就是在Note6开头中提及到的Minimax太过悲观,AlphaBetaAgent总是假设对手会采取最优行动所以pacman会直接放弃争取机会。而Expectimax中认定ghost是随机行动在计算期望值后会在关键时刻决定赌一把Q5.Evaluation Function和Q1问题相同都是完成Evaluation Function函数传入的参数里面没有action,所以没有stop惩罚了感觉没有了灵魂。我的整体思路是总分 游戏自带分数 最近食物距离 - 食物数量(长期目标) - 普通鬼惩罚 可吃鬼奖励 - 胶囊距离惩罚代码实现defbetterEvaluationFunction(currentGameState:GameState): Your extreme ghost-hunting, pellet-nabbing, food-gobbling, unstoppable evaluation function (question 5). DESCRIPTION: write something here so we know what you did *** YOUR CODE HERE ***poscurrentGameState.getPacmanPosition()foodcurrentGameState.getFood()ghostStatescurrentGameState.getGhostStates()scaredTimes[ghostState.scaredTimerforghostStateinghostStates]capsulescurrentGameState.getCapsules()scorecurrentGameState.getScore()foodListfood.asList()iffoodList:distToClosestFoodmin(manhattanDistance(pos,food)forfoodinfoodList)score20.0/distToClosestFoodelse:score1000score-5*len(foodList)ifcapsules:distToClosestCapsulemin(manhattanDistance(pos,capsule)forcapsuleincapsules)score5.0/distToClosestCapsule score-20*len(capsules)forghostinghostStates:ghostPosghost.getPosition()ghostDistmanhattanDistance(pos,ghostPos)ifghost.scaredTimer0:score100.0/ghostDistelse:ifghostDist1:score-500elifghostDist3:score-200elifghostDist5:score-100returnscore基本思路和Q1是一样的没有了Stop的惩罚让pacman在执行动作过程中会出现特别特别多的无厘头停止动作启发式内容仍需要优化但是可以pass autograder
http://www.gsyq.cn/news/1352030.html

相关文章:

  • 解决Keil MDK中Arm Compiler V6.6.1许可错误
  • python社区技术论坛交流平台
  • Context Engineering 2026:超越Prompt工程的下一个AI能力边界
  • 魔兽争霸III终极优化指南:WarcraftHelper完整解决方案
  • LLM评估体系工程2026:超越“感觉不错“的科学评估方法
  • AMBA CHI协议SACTIVE信号机制与低功耗设计解析
  • 为什么你的Agent总在真实场景中“失语”?揭秘LLM调用链中被忽略的2个关键中间态(Meta Llama-3.1内部调试日志首度公开)
  • 贵州方言语音AI落地难?从数据采集、音素映射到MOS评分提升至4.1的5步攻坚法
  • ElevenLabs江苏话语音合规指南(含网信办2024方言AI备案清单):3类禁用场景、5项声纹脱敏强制要求与审计日志模板
  • 【编号120】珠江三角洲城市群区域开发密度数据
  • AirPodsDesktop:在Windows上解锁苹果耳机的完整体验
  • Meta 裁员约 8000 人:弥补 AI 巨额投资,削减人力成本
  • Spotify推AI应用Studio,结合多信息源生成简报、播客和歌单!能“代你行动”
  • 2026年靠谱的铣刀/东莞钨钢铣刀深度厂家推荐 - 品牌宣传支持者
  • Arm DS中Git集成与嵌入式开发版本控制实践
  • 从零到一:手把手教你用MounRiver Studio配置沁恒CH32V208工程(附官方例程结构解析)
  • 不止于Windows:用QtService源码打造跨平台(Windows/Linux)守护进程的实践指南
  • 【c++面向对象编程】第45篇:萃取(Traits)技术与策略类:STL源码中的智慧
  • 别再只用Graphics2D了!5个Java图片缩放方案实战评测:从Thumbnailator到OpenCV,谁画质最好?
  • 基于SpringBoot2+vue2的流浪宠物管理系统
  • ArcGIS Desktop 10.2 安装后必做的5件事:从激活分析拓展到优化地图性能
  • 2026年如何开通小程序店铺
  • 锂电池健康评估:避开NASA/Oxford数据IC分析中的三个常见坑(滤波、异常值、容量增生)
  • 使用Taotoken后我的大模型API月度账单清晰可见
  • 别再踩坑了!Windows 10下CUDA 11.3与cuDNN 8.2.1保姆级配置指南(附TensorFlow/PyTorch版本对照表)
  • 蓝桥杯嵌入式实战:手把手教你用STM32CubeMX和HAL库封装PWM控制函数(调频调占空比)
  • 从炼丹到炼蛋白:手把手拆解AlphaFold2的模型架构与训练技巧
  • 2026年评价高的LED 薄膜开关/东莞定制薄膜开关厂家综合对比分析 - 行业平台推荐
  • 食品车间阁楼平台这样搭,一次通过SC审查不走弯路
  • 保姆级教程:SAP资产折旧调错了怎么办?手把手教你用AB08和反向事务类型回退操作