Extended Abstracts

Red Session

R-01

  A central problem in artificial intelligence is to plan to maximizefuture reward under uncertainty in a partially observable environment. Models of such environments include Partially ObservableMarkov Decision Processes (POMDPs) as well as their generalizations, Predictive State Representations (PSRs) and Observable Operator Models (OOMs). POMDPs model the state ofthe world as a latent variable; in contrast, PSRs and OOMs represent state by tracking occurrence probabilities of a set of futureevents (called tests or characteristic events) conditioned on pastevents (called histories or indicative events). Unfortunately, exactplanning algorithms such as value iteration are intractable formost realistic POMDPs due to the curse of history and the curse ofdimensionality. However, PSRs and OOMs hold the promiseof mitigating both of these curses: first, many successful approximate planning techniques designed to address these problems inPOMDPs can easily be adapted to PSRs and OOMs. Second,PSRs and OOMs are often more compact than their correspondingPOMDPs (i.e., need fewer state dimensions), mitigating the curseof dimensionality. Finally, since tests and histories are observablequantities, it has been suggested that PSRs and OOMs should beeasier to learn than POMDPs; with a successful learning algorithm,we can look for a model which ignores all but the most importantcomponents of state, reducing dimensionality still further.In this paper we take an important step toward realizing the abovehopes. In particular, we propose and demonstrate a fast and statistically consistent spectral algorithm which learns the parametersof a PSR directly from sequences of action-observation pairs. Wethen close the loop from observations to actions by planning in thelearned model and recovering a policy which is near-optimal in theoriginal environment. Closing the loop is a much more stringenttest than simply checking short-term prediction accuracy, since thequality of an optimized policy depends strongly on the accuracy ofthe model: inaccurate models typically lead to useless plans. Closing the Learning-Planning Loop with Predictive State Representations Byron Boots, Sajid Siddiqi, Geoffrey Gordon

R-05

  Wikis today are being used as a tool to conduct collaborativewriting assignments in classrooms. However, typical Wikis (1) donot provide group formation methods to improve the collaborativelearning of the students and (2) suffer from typical problems ofcollaborative learning like free-riding (earning credit withoutcontribution) and lacking conveniences to facilitate teacherinterventions. To improve the state of the art of the typical Wikisused in classrooms, we have designed and implementedClassroomWiki — a Web-based collaborative Wiki writing toolthat combines a set of learner pedagogy theories with multiagenttracking, modeling, and group formation. For the students,ClassroomWiki provides a Web interface for writing and revisingtheir group's Wiki and a topic-based forum for discussing theirideas during collaboration. When the students collaborate,ClassroomWiki's agents track all student activities and builddetailed student models that represent their contributions towardtheir groups and uses MHCF algorithm to form student groups toimprove the collaborative learning of students. We have deployedClassroomWiki in two university-level courses to investigate theimpact of ClassroomWiki. The results show that ClassroomWikican (1) improve the collaborative learning outcome of the studentsby its group formation framework and (2) alleviate free-riding andfacilitate teacher interventions by its multiagent tracking andmodeling. ClassroomWiki: A Wiki for the Classroom with Multiagent Tracking, Modeling, and Group Formation Nobel Khandaker, Leen-Kiat Soh

R-28

  This paper proposes Market-based Iterative Risk Allocation (MIRA),a new market-based decentralized optimization algorithm for multi-agent systems under stochastic uncertainty, with a focus on problems with continuous action and state space. In large coordinationproblems, from power grid management to multi-vehicle missions,multiple agents act collectively in order to maximize the performance of the system, while satisfying mission constraints. Theseoptimal action plans are particularly susceptible to risk when uncertainty is introduced. We present a decentralized optimization algorithm that minimizes the system cost while ensuring that the probability of violating mission constraints is below a user-specified upper bound.We build upon the paradigm of risk allocation, in which theplanner optimizes not only the sequence of actions, but also its allocation of risk among state constraints. We extend the concept ofrisk allocation to multi-agent systems by highlighting risk as a resource that is traded in a computational market. The equilibriumprice of risk that balances the supply and demand is found by aniterative price adjustment process called t\^{a}tonnement (also knownas Walrasian auction). Our work is distinct from the classical t\^{a}tonnement approach in that we use Brent's method to provide fastguaranteed convergence to the equilibrium price. The simulationresults demonstrate the efficiency and optimality of the proposeddecentralized optimization algorithm. Market-based Risk Allocation for Multi-agent Systems Masahiro Ono, Brian Williams

R-34

  As robots become more commonplace, the tools to facilitate knowledge transfer from human to robot will be vital, especially for non-technical users. While some ongoingwork considers the role of human reinforcement in intelligent algorithms, the burden of learning is often placed solelyon the computer. These approaches neglect the expressive capabilities of humans, especially regarding our abilityto quickly refine motor skills. Thus, when designing autonomous robots that interact with humans, not only is itimportant to leverage machine learning, but it is also veryuseful to have the tools in place to facilitate the transfer ofknowledge between man and machine. We introduce such atool for enabling a human to transfer motion learning capabilities to a robot.In this paper, we propose a general framework for MotionAcquisition in Robots through Iterative Online EvaluativeTraining (MARIOnET). Specifically, MARIOnET represents a direct and real-time interface between a human ina motion-capture suit and a robot, with a training processthat provides a convenient human interface and requires notechnical knowledge. In our framework, the learning happens exclusively by the human - not the robot. However,the robot provides a natural interface for interaction, andis able to store and reuse trained behaviors autonomouslyin the future. Our approach exploits the ability at whichhumans are able to learn and refine fine-motor skills.Implemented on two robots (one quadruped and one biped),our results indicate that both technical and non-technicalusers are able to harness MARIOnET to quickly improve arobot's performance of a task requiring fine-motor skills. MARIOnET: Motion Acquisition for Robots through Iterative Online Evaluative Training Adam Setapen, Michael Quinlan, Peter Stonehas 5 papers

R-45

  Game-theoretic analyses of multi-agent systems typically assume that all agents have full knowledge of everyone's possible moves, information sets and utilities for each outcome.Bayesian games relax this assumption by allowing agents tohave different "types," representing different beliefs aboutthe game being played, and to have uncertainty over otheragents' types. However, applications of Bayesian games almost universally assume that all agents share a commonprior distribution over everyone's type. We argue, in concordwith certain economists, that such games fail to accuratelyrepresent many situations. However, when the commonprior assumption is abandoned, several modeling challengesarise, one of which is the emergence of complex belief hierarchies. In these cases it is necessary to specify which partsof other agents' beliefs are relevant to an agent's decision-making (or need be known by that agent). We address thisissue by suggesting a concise way of representing Bayesiangames with uncommon priors. Our representation centersaround the concept of a block, which groups agents' view of(a) the game being played and (b) their posterior beliefs.This allows us to construct the belief graph, a graphicalstructure that allows agents' knowledge of other agents' beliefs to be carefully specified. Furthermore, when agents'views of the world are represented by extensive form games,our block structure places useful semantic constraints onthe extensive form trees. Our representation can be usedto naturally represent games with rich belief structures andinteresting predicted behavior. Representing Bayesian Games Without a Common Prior Dimitrios Antos, Avi Pfefferhas 2 papers

R-57

  Combinatorial auctions (CAs) are an important mechanismfor allocating multiple goods while allowing self-interestedagents to specify preferences over bundles of items. Winner determination for a CA is known to be NP-complete.However, restricting the problem can allow us to solve winner determination in polynomial time. These restrictionssometimes apply to the CA's representation. There are twocommonly studied, and structurally different graph representations of a CA: bid graphs and item graphs. We studythe relationship between these two representations.We show that for a given combinatorial auction, if a graphwith maximum cycle length three is a valid item graph forthe auction, then its bid graph representation is a chordalgraph. Next, we present a new technique for constructingitem graphs using a novel definition of equivalence amongcombinatorial auctions. The solution to the WDP for a givenCA can easily be translated to a solution on an equivalentCA. We use our technique to simplify item graphs, and showthat if a CA's bid graph is chordal, then there exists anequivalent CA with a valid item graph of treewidth one, forwhich a solution to the WDP is known to be efficient. Thisresult demonstrates how CA equivalence can simplify thestructure of item graphs and lead to more efficient solutionsto the WDP, which are also a solutions to the WDP for theoriginal auctions. An Investigation of Representations of Combinatorial Auctions David Lokerhas 2 papers, Kate Larsonhas 6 papers

Blue Session

B-30

  Crowd behavior simulation has been an active field of research because of its utility in several applications such asemergency planning and evacuations, designing and planning pedestrian areas, subway or rail-road stations, besidesin education, training and entertainment. The most advanced and realistic simulation systems employ intelligentautonomous agents with a balance between individual andgroup intelligence for scalability of the architectures. Although several systems have even been commercialized, little attention has been accorded to the problem of validatingthe outcomes of these simulations in a generalized manner,against reality. The extent of validation fails to go muchbeyond visual matching between the simulation and the actual scenarios (with recordings of human crowds), which canlead to highly subjective and often questionable conclusions.The existing numerical measures often rely on ad-hoc applications, e.g., local crowd densities are measured to verifypatterns, without a systematic procedure to identify at whattimes in the simulation and the scenarios can the densities becompared. Furthermore, if there are multiple systems thatsimulate crowd behavior in the same scenario in the samevirtual environment, then no technique is currently known toquantitatively compare these systems in terms of realism. Inthis abstract, we present the first (to the best of our knowledge) principled, unified, and automated technique to quantitatively validate and compare the global performance ofcrowd egress simulation systems. We also evaluate a multiagent based crowd egress simulation system (that we haverecently developed, but we do not discuss this system here)using our technique and demonstrate a high degree of validity of that system as well. Validation of Agent Based Crowd Egress Simulation Bikramjit Banerjeehas 3 papers, Landon Kraemerhas 3 papers

 


Copyright © 2010 IFAAMAS