- Published on
Multi-Model Agent
- Authors
- Name
- Lucas Xu
- @xianminx
From perception to action:
https://llmagents-learning.org/sp25
Course Video: https://www.youtube.com/live/n__Tim8K2IY
vision-language-action models VLA-Ms
- Coding agents
- Web agents
- Mobile agents
- Physical agents
components
- Environment/Benchmark: Should be reconfigurable and expandable
- Data: Diverse modalities, large-scale, covering a wide range of tasks
- Model/System: Unified vision-language-reasoning-action model, and long-context inference.
Computer use
- Mind2Web
- WebArena
- OSWorld a11y tree https://developer.mozilla.org/en-US/docs/Glossary/Accessibility_tree
Agenttrek: agent trajectory synthesis via guiding replay with web tutorials
(HLE) Humanity's Last Exam
Humanity's Last Exam
AGI is a pretty awkard and strange term. short for artificial general intelligence. but why, why not a super intelligence? or general artificial intelligence? or even a super intelligence?