As learning agents move from research labs to the real world, it is increasingly important that human users, including those without programming skills, be able to teach agents desired behaviors. Recently, the TAMER framework was introduced for designing agents that can be interactively shaped by human trainers who give only positive and negative feedback signals. Past work on TAMER showed that shaping can greatly reduce the sample complexity required to learn a good policy, can enable lay users to teach agents the behaviors they desire, and can allow agents to learn within a Markov Decision Process (MDP) in the absence of a coded reward function. However, TAMER does not allow this human training to be combined with autonomous learning based on such a coded reward function. This paper leverages the fast learning exhibited within the TAMER framework to hasten a reinforcement learning (RL) algorithm's climb up the learning curve, effectively demonstrating that human reinforcement and MDP reward can be used in conjunction with one another by an autonomous agent. We tested eight plausible TAMER+RL methods for combining a previously learned human reinforcement function, $\hat{H}$, with MDP reward in a reinforcement learning algorithm. This paper identifies which of these methods are most effective and analyzes their strengths and weaknesses. Results from these TAMER+RL algorithms indicate better final performance and better cumulative performance than either a TAMER agent or an RL agent alone. Combining Manual Feedback with Subsequent MDP Reward Signals for Reinforcement Learning
We introduce the novel problem of inter-robot transfer learning for perceptual classification of objects, where multiple heterogeneous robots communicate and transfer learned object models consisting of a fusion of multiple object properties. Unlike traditional transfer learning, there can be severe differences in the data distributions, resulting from differences in sensing, sensory processing, or even representations, that each robot uses to learn. Furthermore, only some properties may overlap between the two robots. We show that in such cases, the abstraction of raw sensory data into an intermediate representation can be used not only to aid learning, but also the transfer of knowledge. Further, we utilize statistical metrics, learned during an interactive process where the robots jointly explore the environment, to determine which underlying properties are shared between the robots. We demonstrate results in a visual classification task where objects are represented via a combination of properties derived from different modalities: color, texture, shape, and size. Using our methods, two heterogeneous robots utilizing different sensors and representations are able to successfully transfers support vector machine (SVM) classifiers among each other, resulting in speedups during learning. Inter-Robot Transfer Learning for Perceptual Classification
Large heterogeneous teams will often be in situations where sensor datathat is uncertain and conflicting is shared across a peer-to-peer network.Not every team member will have direct access to sensors and team members will be influenced mostly by teammates with whom they communicatedirectly. In this paper, we investigate the dynamics and emergent behaviors of a large team sharing beliefs to reach conclusions about the world.We find empirically that the dynamics of information propagation in suchbelief sharing systems are characterized by information avalanches of belief changes caused by a single additional sensor reading. The distributionof the size of these avalanches dictates the speed and accuracy with whichthe team reaches conclusions. A key property of the system is that it exhibits qualitatively different dynamics and system performance over smallchanges in system parameter ranges. In one particular range, the systemexhibits behavior known as scale-invariant dynamics which we empiricallyfind to correspond to dramatically more accurate conclusions being reachedby team members. Due to the fact that the ranges are very sensitive toconfiguration details, the parameter ranges over which specific system dynamics occur are extremely difficult to predict precisely. In this paper we(a) develop techniques to mathematically characterize the dynamics of theteam belief propagation (b) obtain through simulations the relation betweenthe dynamics and overall system performance, and (c) develop a novel distributed algorithms that the agents in the team use locally to steer the wholeteam to areas of optimized performance. Exploiting Scale Invariant Dynamics for Efficient Information Propagation in Teams
Learning, planning, and representing knowledge in large state spaces at multiple levels of temporal abstraction are key, long-standing challenges for building flexible autonomous agents. The options framework provides a formal mechanism for specifying and learning reusable temporally-extended skills. Although past work has demonstrated the benefit of acting according to options in large state spaces, one of the central advantages of temporal abstraction — the ability to plan using a temporally abstract model — remains a challenging problem when the number of environment states is large or infinite. In this work, we develop a knowledge construct, the linear option, which is capable of modeling temporally abstract dynamics in large state spaces. We show that planning with a linear expectation model of an option's dynamics converges to a fixed point with low Temporal Difference (TD) error. Next, building on recent work on linear feature selection, we show conditions under which a linear feature set is sufficient for accurately representing the value function of an option policy. We extend this result to show conditions under which multiple options may be repeatedly composed to create new options with accurate linear models. Finally, we demonstrate linear option learning and planning algorithms in a simulated robot environment. Linear Options
The use of energy storage devices in homes has been advocated as one of the main ways of saving energy and reducing the reliance on fossil fuels in the future Smart Grid. However, if micro-storage devices are all charged at the same time using power from the electricity grid, it means a higher demand and, hence, more generation capacity, more carbon emissions, and, in the worst case, breaking down the system due to over-demand. To alleviate such issues, in this paper, we present a novel agent-based micro-storage management technique that allows all (individually-owned) storage devices in the system to converge to profitable, efficient behaviour. Specifically, we provide a general framework within which to analyse the Nash equilibrium of an electricity grid and devise new agent-based storage learning strategies that adapt to market conditions. Taken altogether, our solution shows that, specifically, in the UK electricity market, it is possible to achieve savings of up to 13\% on average for a consumer on his electricity bill with a storage device of 4 kWh. Moreover, we show that there exists an equilibrium where only 38\% of UK households would own storage devices and where social welfare would be also maximised (with an overall annual savings of nearly GBP 1.5B at that equilibrium). Agent-based Micro-Storage Management for the Smart Grid
Many problems in multiagent decision making can be addressed using tournament solutions, i.e., functions that associate with each completeand asymmetric relation on a set of alternatives a non-empty subset of the alternatives. For any given tournemant solution $S$, there is another tournament solution $\dot{S}$, which returns the union of all inclusion-minimal sets that satisfy S-retentiveness, a natural stability criterion with respect to S. Schwartz's tournament equilibrium set ($TEQ$) is then defined as $TEQ = T\dot{E}Q$. Due to this unwieldy recursive definition, preciously little is known about $TEQ$. Contingent on a well-known conjecture about $TEQ$, we show that $\dot{S}$ inherits a number of important and desirable properties from $S$. We thus obtain an infinite hierarchy of attractive and efficiently computable tournament solutions that "approximate" $TEQ$, which itself is intractable. This hierarchy contains well-known tournament solutions such as the top cycle ($TC$) and the minimal covering set ($MC$). We further pove a weaker version of the conjecture mentioned above, which establishes $\dot{T}C$ as an attractive new tournament solution. Minimal Retentive Sets in Tournaments
Within the domain of virtual agents, Theory of Mind reasoning is considered an important attribute to enable such agents to act in an intelligent fashion. Such ideas have been applied in various domains, ranging from serious games to virtual storytelling. Another interesting domain of application of agents attributed with Theory of Mind is the entertainment gaming industry, which poses a completely different goal upon the behavior of these agents, namely to bring the players a fun experience. To this end, this paper introduces an approach to develop Theory of Mind agents for the entertainment gaming industry. Hereby, PRS is used for the reasoning of the agent, which has been extended with Theory of Mind reasoning capabilities. The behavior of the agent has been evaluated against two other agents (a simple reactive agent, and a memory-based agent) in an experiment that has been conducted in which 15 participants had to play against all agent types, and rate each one of them based upon certain criteria. Evaluation of Virtual Agents utilizing Theory of Mind in a Real Time Action Game
Incremental search algorithms, such as Generalized Fringe-Retrieving A* and D* Lite, reuse search trees from previous searches to speed up the current search and thus oftenfind cost-minimal paths for series of similar search problemsfaster than by solving each search problem from scratch.However, existing incremental search algorithms have limitations. For example, D* Lite is slow on moving targetsearch problems, where both the start and goal states canchange over time. In this paper, we therefore introduceMoving Target D* Lite, an extension of D* Lite that usesthe principle behind Generalized Fringe-Retrieving A* to repeatedly calculate a cost-minimal path from the hunter tothe target in environments whose blockages can change overtime. We demonstrate experimentally that Moving TargetD* Lite is four to five times faster than Generalized AdaptiveA*, which so far was believed to be the fastest incrementalsearch algorithm for solving moving target search problemsin dynamic environments. Moving Target D* Lite
Video games provide a rich testbed for artificial intelligence methods. In particular, creating automated opponents that perform well in strategy games is a difficult task. For instance, human players rapidly discover and exploit the weaknesses of hard coded strategies. To build better strategies, we suggest a reinforcement learning approach for learning a policy that switches between high-level strategies. These strategies are chosen based on different game situations and a fixed opponent strategy. Our learning agents are able to rapidly adapt to fixed opponents and improve deficiencies in the hard coded strategies, as the results demonstrate. High-level Reinforcement Learning in Strategy Games
Autonomous virtual agents generally lack the support to understand a fundamental character in their world —- the user's avatar. This has a negative impact on their behavioural believability. In this paper, we present an approach to detect the intent underlying certain actions of the user. We begin by introducing the relation between intention and action, then proceed on elaborating a framework that can interpret the intent of an action based on the matching and mismatching of anticipated behaviour. We then present a test-case in which our framework is used to create an agent architecture controlling a virtual dog that interacts with the user within a virtual world. Finally, we discuss three aspects of our evaluation: the user's intent recognition, its interpretation and the solution's efficiency. Our results suggest that our solution can be used to detect certain intents and, in such cases, perform similarly to a human observer, with no impact on computational performance. I Mean It! Detecting User Intentions to create Believable Behaviour for Virtual Agents in Games
Speakers in dialogue tend to adapt to each other by starting to use similar lexical items, syntactic structures, or gestures. This behaviour, called alignment, may serve important cognitive, communicative and social functions (such as speech facilitation, grounding and rapport). Our aim is to enable and study the effects of these subtle aspects of communication in virtual conversational agents. Building upon a model for autonomous speech and gesture generation, we describe an approach to make the agent's multimodal behaviour adaptive in an interactive manner. This includes (1) an activation-based microplanner that makes linguistic choices based on lexical and syntactic priming, and (2) an empirically grounded gesture generation such that linguistic priming parallels concordant gestural adaptation. First results show that the agent aligns to its interaction partners by picking up their syntactic structures and lexical items in its subsequent utterances. These changes in the agent's verbal behaviour also have a direct influence on gestural expressions. Adaptive Expressiveness — Virtual Conversational Agents That Can Align to Their Interaction Partner
Virtual agents are a great opportunity in teaching inter-cultural competencies. Advantages, such as the repeatability of training sessions, emotional distance to virtual characters, the opportunity to over-exaggerate or generalize behavior or simply to save the costs for human training-partners support that idea. Especially the way communication is coordinated varies across cultures. In this paper, we present our approach of simulating differences in the management of communication for the American and Arabic cultures. Therefore, we give an overview of behavioral tendencies described in the literature, pointing out differences between the two cultures. Grounding our expectations in empirical data we analyzed a multi-modal corpora. These findings were integrated into a demonstrator using virtual agents and evaluated in a preliminary study. A data-driven approach to model Culture-specific Communication Management Styles for Virtual Agents
Increasing teamwork between agents typically increases the performance of a multi-agent system, at the cost of increased communication and higher computational complexity. This work examines joint actions in the context of a multi-agent optimizationproblem where agents must cooperate to balance exploration andexploitation. Surprisingly, results show that increased teamworkcan hurt agent performance, even when communication and computation costs are ignored, which we term the team uncertaintypenalty. This paper introduces the above phenomena, analyzes it,and presents algorithms to reduce the effect of the penalty in ourproblem setting. When Should There be a "Me" in "Team"? Distributed Multi-Agent Optimization Under Uncertainty
In typical multiagent \emph{teamwork} settings, the teammates are either programmed together, or are otherwise provided with standard communication languages and coordination protocols. In contrast, this paper presents an \emph{ad hoc team} setting in which the teammates are not pre-coordinated, yet still must work together in order to achieve their common goal(s). We represent a specific instance of this scenario, in which a teammate has limited action capabilities and a fixed and known behavior, as a multiagent cooperative $k$-armed bandit. In addition to motivating and studying this novel ad hoc teamwork scenario, the paper contributes to the $k$-armed bandits literature by characterizing the conditions under which certain actions are potentially optimal, and by presenting a polynomial dynamic programming algorithm that solves for the optimal action when the arm payoffs come from a discrete distribution. To Teach or not to Teach? Decision Making Under Uncertainty in Ad Hoc Teams
We consider the issue of representing coalitional games in multi-agent systems that exhibit externalities from coalition formation,i.e., systems in which the gain from forming a coalition may be affected by the formation of other co-existing coalitions. Althoughexternalities play a key role in many real-life situations, very littleattention has been given to this issue in the multi-agent system literature, especially with regard to the computational aspects involved.To this end, we propose a new representation which, in the spiritof Ieong and Shoham, is based on Boolean expressions. Theidea behind our representation is to construct much richer expressions that allow for capturing externalities induced upon coalitions.We show that the new representation is fully expressive, at least asconcise as the conventional partition function game representationand, for many games, exponentially more concise. We evaluate theefficiency of our new representation by considering the problem ofcomputing the Extended and Generalized Shapley value, a powerful extension of the conventional Shapley value to games withexternalities. We show that by using our new representation, theExtended and Generalized Shapley value, which has not been studied in the computer science literature to date, can be computed intime linear in the size of the input. A Logic-Based Representation for Coalitional Games with Externalities
Distributed Constraint Optimization (DCOP) is a popular framework for cooperative multi-agent decision making. DCOP is NP-hard, so an important line of work focuses on developing fast incomplete solution algorithms for large-scale applications. One ofthe few incomplete algorithms to provide bounds on solution quality is k-size optimality, which defines a local optimality criterionbased on the size of the group of deviating agents. Unfortunately,the lack of a general-purpose algorithm and the commitment toforming groups based solely on group size has limited the use ofk-size optimality.This paper introduces t-distance optimality which departs fromk-size optimality by using graph distance as an alternative criteriafor selecting groups of deviating agents. This throws open a new research direction into the tradeoffs between different group selectionand coordination mechanisms for incomplete DCOP algorithms.We derive theoretical quality bounds for t-distance optimality thatimprove known bounds for k-size optimality. In addition, we develop a new efficient asynchronous local search algorithm for finding both k-size and t-distance optimal solutions — allowing theseconcepts to be deployed in real applications. Indeed, empirical results show that this algorithm significantly outperforms the only existing algorithm for finding general k-size optimal solutions, whichis also synchronous. Finally, we compare the algorithmic performance of k-size and t-distance optimality using this algorithm. Wefind that t-distance consistently converges to higher-quality solutions in the long run, but results are mixed on convergence speed;we identify cases where k-size and t-distance converge faster. Asynchronous Algorithms for Approximate Distributed Constraint Optimization with Quality Bounds
DPOP is an algorithm for distributed constraint optimization whichhas, as main drawback, the exponential size of some of its messages. Recently, some algorithms for distributed cluster tree elimination have been proposed. They also suffer from exponential sizemessages. However, using the strategy of cost function filtering,in practice these algorithms obtain important reductions in maximum message size and total communication cost. In this paper,we explain the relation between DPOP and these algorithms, andshow how cost function filtering can be combined with DPOP. Wepresent experimental evidence of the benefits of this new approach. Improving DPOP with Function Filtering
In this paper we investigate an approach to provide approximate, anytime algorithms for DCOPs that can provide quality guarantees.At this aim, we propose the divide-and-coordinate (DaC) approach. Such approach aims at solving a DCOP by iterating(1) a divide stage in which agents divide the original DCOP into a set of simpler localsubproblems and solve them; and (2) a coordinate stage in which agents exchangelocal information that bring them closer to an agreement. Next, weformulate a novel algorithm, the Divide and Coordinate SubgradientAlgorithm (DaCSA), a particular implementation of the DaCapproach based on Lagrangian decompositions and the dual subgradientmethod. By relying on the DaC approach, DaCSA provides boundedapproximate solutions. We empirically evaluate DaCSA showing thatit is competitive with other state-of-the-art DCOP approximate algorithmsand can eventually outperform them while providing reasonable qualityguarantees. Divide-and-Coordinate: DCOPs by agreement
Games are used to evaluate and advance Multiagent and Artificial Intelligence techniques. Most of these games are deterministic with perfect information (e.g. Chess and Checkers). A deterministic game has no chance element and in a perfect information game, all information is visible to all players. However, many real-world scenarios with competing agents are stochastic (non-deterministic) with imperfect information. For two-player zero-sum perfect recall games, a recent technique called Counterfactual Regret Minimization (CFR) computes strategies that are provably convergent to an $\epsilon$-Nash equilibrium. A Nash equilibrium strategy is useful in two-player games since it maximizes its utility against a worst-case opponent. However, for multiplayer (three or more player) games, we lose all theoretical guarantees for CFR. However, we believe that CFR-generated agents may perform well in multiplayer games. To test this hypothesis, we used this technique to create several 3-player limit Texas Hold'em poker agents and two of them placed first and second in the 3-player event of the 2009 AAAI/IJCAI Computer Poker Competition. We also demonstrate that good strategies can be obtained by grafting sets of two-player subgame strategies to a 3-player base strategy after one of the players is eliminated. Using Counterfactual Regret Minimization to Create Competitive Multiplayer Poker Agents
We study a particular multiagent resource allocation problem with indivisible, but sharable resources. In our model, the utility of an agent for using a bundle of resources is the difference between a valuation of that bundle and a congestion cost (or delay), a figure formed by adding up the individual congestion costs of each resource in the bundle. The valuation and the delay can be agent-dependent. When the agents that share a resource also share the resource's control, the current users of a resource will require some compensation when a new agent wants to use the resource. We study the existence of distributed protocols that lead to a social optimum. Depending on constraints on the valuation functions (mainly modularity), on the delay functions (e.g., convexity), and the structural complexity of the deals between agents, we prove either the existence of some sequences of deals or the convergence of all sequences of deals to a social optimum. When the agents do not have joint control over the resources (i.e., they can use any resource they want), we study the existence of pure Nash equilibria. We provide results for modular valuation functions and relate them to results from the literature on congestion games. Multiagent Resource Allocation with Sharable Items: Simple Protocols and Nash Equilibria
Coalition structure generation (CSG) for multi-agent systems is awell-studied problem. A vast majority of the previous work and thestate-of-the-art approaches to CSG assume a characteristic functionform of the coalition values, where a coalition's value is independentof the other coalitions in the coalition structure. Recently, therehas been interest in the more realistic {\em partition function} formof coalition values, where the value of a coalition is affected by howthe other agents are partitioned, via {\em externalities}. However,the most recent studies in this direction have focused on cases whereall externalities are either {\em always positive or always negative},and results on coalition structure generation in more general settings(in particular, {\em mixed externalities}) are lacking. In this paperwe propose a framework based on {\em agent-types} that incorporatesmixed externalities and demonstrate that it includes the previoussettings as special cases. We also generalize some previous results inanytime CSG, showing that those results are again special cases. Inparticular, we extend the existing branch and bound algorithm to thisnew setting and show empirically that significant pruning can beachieved when searching for the optimal coalition structure. Thisextends the state-of-the-art in CSG for multi-agent systems. Coalition Structure Generation in Multi-Agent Systems with Mixed Externalities
We present a new procedure for solving large games of imperfect information. Our approach involves — somewhat counterintuitively — solving an infinite approximation of the original game, then mapping the equilibrium to a strategy profile in the original game. Our main algorithm exploits some qualitative model of equilibrium structure as an additional input to find an equilibrium in continuous games. We prove that our approach is correct even if given a set of qualitative models (satisfying a technical property) of which only some are accurate. We compute equilibria in several classes of games for which no prior algorithms have been developed. In the course of our analysis, we also develop the first mixed-integer programming formulations for computing an epsilon-equilibrium in general multiplayer normal and extensive-form games based on the extension of our initial algorithm to the multiplayer setting, which may be of independent interest. Experiments suggest that our approach can outperform the prior state of the art, abstraction-based approaches. In addition, we demonstrate the effectiveness of our main algorithm on a subgame of limit Texas hold'em — the most studied imperfect-information game in computer science. Computing Equilibria by Incorporating Qualitative Models
We introduce a constrained mechanism design setting calledinternal implementation, in which the mechanism designeris explicitly modeled as a player in the game of interest.This distinguished player has the opportunity to modify thegame before play. Specifically, the player is able to makereliable binding commitments of outcome-specific monetarytransfers to the other players in the game. We characterizethe power of internal implementation for certain interestingclasses of games, and show that the impact of internal implementation on the utility of the players’ and the social welfareis often counterintuitive; for example, the social welfare canbe arbitrarily worse after an internal implementation. Internal Implementation
We consider the computational complexity of pure Nash equilibria in graphical games. It is known that the problem is NP-complete in general, but tractable (i.e., in P) for special classes of graphs such as those with bounded treewidth. It is then natural to ask: is it possible to characterize all tractable classes of graphs for this problem? In this work, we provide such a characterization for the case of bounded in-degree graphs, thereby resolving the gap between existing hardness and tractability results.In particular, we analyze the complexity of PUREGG(C, —), the problem of deciding the existence of pure Nash equilibria in graphical games whose underlying graphs are restricted to class C.We prove that, under reasonable complexity theoretic assumptions, for every recursively enumerable class C of directed graphs with bounded in-degree, PUREGG(C, —) is in polynomial time if and only if the reduced graphs (the graphs resulting from iterated removal of sinks) of C have bounded treewidth. We also give a characterization for PURECHG(C,—), the problem of deciding the existence of pure Nash equilibria in colored hypergraphical games, a game representation that can express the additional structure that some of the players have identical local utility functions. We show that the tractable classes of bounded-arity colored hypergraphical games are precisely those whose reduced graphs have boundedtreewidth modulo homomorphic equivalence. Our proofs make novel use of Grohe's characterization of the complexity of homomorphism problems. Pure Nash Equilibria: Complete Characterization of Hard and Easy Graphical Games
We propose a method for constructing Dempster-Shafer belief functions modeling the trust of a given agent (the evaluator) in another (the target) by combining statistical information concerning the past behaviour of the target and arguments concerning the target’s expected behaviour. Thesearguments are built from current and past contracts betweenevaluator and target. We prove that our method extends astandard computational method for trust that relies uponstatistical information only. We observe experimentally thatthe two methods have identical predictive performance whenthe evaluator is highly "cautious", but our method gives asignificant increase when the evaluator is not or is only moderately "cautious". Finally, we observe experimentally thattarget agents are more motivated to honour contracts whenevaluated using our model of trust than when trust is computed on a purely statistical basis. Combining statistics and arguments to compute trust
In Open Multi-Agent Systems (OMAS), deciding with whom to interact is a particularly difficult task for an agent, as repeated interactions with the same agents are scarce, and reputation mechanisms become increasingly unrealiable. In this work, we present a coordination artifact which can be used by agents in an OMAS to take more informed decisions regarding partner selection, and thus to improve their individual utilities. This artifact monitors the interactions in the OMAS, evolves a role taxonomy, and assigns agents to roles based on their observed performance in different types of interactions. this information can be used by agents to better estimate the expected behaviour of potential counterparts in future interactions. We thus highlight the descriptive features of roles, providing expectations of the behaviour of agents in certain types of interactions, rather than their normative facets. We empirically show that the use of the artifact helps agents to select better partners for their interactions than selection processes based only on agents' own experience. This is especially significant for agents that are newcomers to the OMAS. Role Evolution in Open Multi-Agent Systems as an Information Source for Trust
This paper concerns the problem of agent trust in an electronic market place. We maintain that agent trust involves making decisions under uncertainty and therefore the phenomenon should be modelled probabilistically. We therefore propose a probabilistic framework that models agent interactions as a Hidden Markov Model (HMM). The observations of the HMM are the interaction outcomes and the hidden state is the underlying probability of a good outcome. The task of deciding whether to interact with another agent reduces to probabilistic inference of the current state of that agent given all previous interaction outcomes. The model is extended to include a probabilistic reputation system which involves agents gathering opinions about other agents and fusing them with their own beliefs. Our system is fully probabilistic and hence delivers the following improvements with respect to previous work: (a) the model assumptions are faithfully translated into algorithms; our system is optimal under those assumptions. (b) It can account for agents whose behaviour is not static with time (c) it can estimate the rate with which an agent's behaviour changes. The system is shown to significantly outperform previous state-of-the-art methods in several numerical experiments. A Probabilistic Model for Trust and Reputation
Establishing trust amongst agents is of central importance to the development of well-functioning multi-agent systems. For example,the anonymity of transactions on the Internet can lead to inefficiencies; e.g., a seller on eBay failing to ship a good as promised, ora user free-riding on a file-sharing network. Trust (or reputation)mechanisms can help by aggregating and sharing trust informationbetween agents. Unfortunately these mechanisms can often be manipulated by strategic agents. Existing mechanisms are either veryrobust to manipulation (i.e., manipulations are not beneficial forstrategic agents), or they are very informative (i.e., good at aggregating trust data), but never both. This paper explores this trade-offbetween these competing desiderata. First, we introduce a metric toevaluate the informativeness of existing trust mechanisms. We thenshow analytically that trust mechanisms can be combined to generate new hybrid mechanisms with intermediate robustness properties. We establish through simulation that hybrid mechanisms canachieve higher overall efficiency in environments with risky transactions and mixtures of agent types (some cooperative, some malicious, and some strategic) than any previously known mechanism. Hybrid Transitive Trust Mechanisms
In open, dynamic multi-agent systems, agents may form short-term ad-hoc groups, such as coalitions, in order to meet their goals. Trust and reputation are crucial concepts in these environments, as agents must rely on their peers to perform as expected, and learn to avoid untrustworthy partners. However, ad-hoc groups introduce issues which impede the formation of trust relationships. For example, they may be short-lived, precluding agents from gaining the necessary experiences to make an accurate trust evaluation. This paper describes a new approach, inspired by theories of human organisational behaviour, whereby agents generalise their experiences with known partners as \emph{stereotypes} and apply these when evaluating new and unknown partners. We show how this approach can complement existing state of the art trust models, and enhance the confidence in the evaluations that can be made about trustees when direct and reputational information is lacking or limited. Bootstrapping Trust Evaluations through Stereotypes
In systems of autonomous self-interested agents, in whichagents’ neighbourhoods are defined by their connections toothers, cooperation can arise through observation of the behaviour of neighbours to determine values of trust and reputation. While there are many techniques for encouragingcooperative behaviour within such systems, they often require a centralised authority or rely on reciprocity that isnot always available. In response, this paper presents a decentralised mechanism to supporting cooperation withoutrequiring reciprocity. The mechanism is based on tag-basedcooperation, supplemented by assessing neighbourhood context and using simple rewiring to cope with cheaters. Inparticular, the paper makes two key contributions. First, itprovides a technique for increasing resilience in the face ofmalicious behaviour by enabling individuals to rewire theirconnections to others and so modify their neighbourhoods.Second, it provides an empirical analysis of several strategiesfor rewiring, evaluating them through simulations. Changing Neighbours: Improving Tag-Based Cooperation
One of the most challenging aspects of reasoning, planning, and acting in a multi-agent domain is reasoning about what the agents know about the knowledge of their fellows, and to take it into account when planning and acting. In the past this has been done using modal and dynamic epistemic logics. In this paper we explore the use of answer set programming (ASP), and reasoning about action techniques for this purpose. These approaches present a number of theoretical and practical advantages. From the theoretical perspective, ASP's property of non-monotonicity (and several other features) allow us to express causality in an elegant fashion. From the practical perspective, recent implementations of ASP solvers have become very efficient, outperforming several other systems in recent SAT competitions. Finally, the use of ASP and reasoning about action techniques allows for the adaptation of a large body of research developed for single-agent to multi-agent domains. We begin our discussion by showing how ASP can be used to find Kripke models of a modal theory. We then illustrate how both the muddy children, and the sum-and-product problems can be represented and solved using these concepts. We describe and implement a new kind of action, which we call "ask-and-truthfully-answer," and show how this action brings forth a new dimension to the muddy children problem. Using Answer Set Programming to model multi-agent scenarios involving agents' knowledge about other's knowledge
Decision making is the ability to decide on the best alternative among a set of candidates based on their value. In many real-world domains the value depends on events that occur dynamically, so that the decision is based on dynamically changing uncertain information. When there is a cost to waiting for more information, the question is when to make the decision. Do you stop and make the best decision you can, given the information you have so far, or do you wait until more information arrives so you can make a better decision? We propose a model that characterizes the influence of dynamic information on the utility of the decision. Based on this model, we present an optimal algorithm that guarantees the best time to stop. Unfortunately, its complexity is exponential in the number of candidates. We present an alternative framework in which the different candidates are solved separately. We formally analyze the alternative framework, and show how it leads to a range of specific heuristic algorithms. We evaluate the optimal and the simplest heuristic algorithms through experiments, and show that the heuristic algorithm is much faster than the optimal algorithm, and the utility of the winner it finds is close to the optimum. Decision Making with Dynamically Arriving Information
Several areas of multi-agent research, such as large-scale agent organization and experience-based decision making, demand novel perspectives and efficient approaches for multiscale information analysis. A recent breakthrough in harmonic analysis is diffusion geometry and diffusion wavelets, which offers a general framework for multiscale analysis of massive data sets. In this paper, we introduce the "diffusion" concept into the MAS field, and investigate the impacts of using diffusion distance on the performance of solution synthesis in experience-based multi-agent decision making. In particular, we take a two-dimensional perspective to explore the use of diffusion distance and Euclidean distance in identifying `similar' experiences — a key activity in the process of recognition-primed decision making. An experiment has been conducted on a data set including a large collection of battlefield decision experiences. It is shown that the performance of using diffusion distance can be significantly better than using Euclidean distance in the original experience space. This study allows us to generalize an anytime algorithm for multi-agent decision making, and it also opens the door to the application of diffusion geometry to multi-agent research involving massive data analysis. Using Geometric Diffusions for Recognition-Primed Multi-Agent Decision Making
Most previous logical accounts of goals do not deal with prioritized goals and goal dynamics properly. Many are restricted to achievement goals. In this paper, we develop a logical account of goal change that addresses these deficiencies. In our account, we do not drop lower priority goals permanently when they become inconsistent with other goals and the agent's knowledge; rather, we make such goals inactive. We ensure that the agent's chosen goals/intentions are consistent with each other and the agent's knowledge. When the world changes, the agent recomputes her chosen goals and some inactive goals may become active again. This ensures that our agent maximizes her utility. We prove that the proposed account has desirable properties. We also discuss previous work on postulates for goal revision. A Logical Framework for Prioritized Goal Change
Computational agents can assist people by guiding their decisions in ways that achieve their goals while also adhering to constraints on their actions. Because some domains are more naturally modeled by representing preferences and constraints separately, we seek to develop efficient techniques for solving such decoupled constraint optimization problems. This paper describes a parameterized formulation for decoupled constraint optimization problems that subsumes the state-of-the-art algorithm of Boutilier et al, representing a wider family of alternative algorithms. We empirically examine notable members of this family to highlight the spaces of decoupled constraint optimization problems for which each excels, highlight fundamental relationships between different algorithmic variations, and use these insights to create and evaluate novel hybrids of these algorithms that a cognitive assistant agent can use to flexibly trade off solution quality with computational time. Generalized Solution Techniques for Preference-Based Constraint Optimization with CP-nets
The static asset protection problem (SAP) in a road network is that of allocating resources to protect vertices, given any possible behavior by an adversary determined to attack those assets. The dynamic asset protection (DAP) problem is a version of SAP where the asset is following a fixed and widely known route (e.g., a parade route) and needs to be protected. We formalize what it means for a given allocation of resources to be "optimal" for protecting a desired set of assets, and show that randomly allocating resources to a single edge-cut in the road network solves this problem. Unlike SAP, we show that DAP is not only an NP-complete problem, but that approximating DAP is also NP-hard. We provide the GreedyDAP heuristic algorithm to solve DAP and show experimentally that it works well in practice, using road network data for real cities. A Graph-Theoretic Approach to Protect Static and Moving Targets from Adversaries
Multi-agent learning is a crucial method to control or find solutions for systems, in which more than one entity is adaptive. In today's interconnected world, such systems are ubiquitous in many domains, including auctions in economics, swarm robotics in computer science, and politics in social sciences. Multi-agent learning is inherently more complex than single-agent learning and has a relatively thin theoretical framework supporting it. Recently, multi-agent learning dynamics have been linked to evolutionary game theory, allowing the interpretation of learning as an evolution of competing policies in the mind of the learning agents. The dynamical system from evolutionary game theory that has been linked to Q-learning predicts the expected behavior of the learning agents. Closer analysis however allows for two interesting observations: the predicted behavior is not always the same as the actual behavior, and in case of deviation, the predicted behavior is more desirable. This discrepancy is elucidated in this article, and based on the new insights Frequency Adjusted Q- (FAQ-) learning is proposed. This variation of Q-learning perfectly adheres to the predictions of the evolutionary model for an arbitrarily large part of the policy space. In addition to the theoretical discussion, experiments in the three classes of two-agent two-action games illustrate the superiority of FAQ-learning. Frequency Adjusted Multi-agent Q-learning
We study the problem of knowledge reuse by a reinforcementlearning agent. We are interested in how an agent can exploitpolicies that were learned in the past to learn a new task moreefficiently in the present. Our approach is to elicit spatial hintsfrom an expert suggesting the world states in which each existingpolicy should be more relevant to the new task. By using thesehints with domain exploration, the agent is able to detect thoseportions of existing policies that are beneficial to the new task,therefore learning a new policy more efficiently. We call ourapproach Spatial Hints Policy Reuse (SHPR). Experimentsdemonstrate the effectiveness and robustness of our method. Ourresults encourage further study investigating how much moreefficacy can be gained from the elicitation of very simple advicefrom humans. Using Spatial Hints to Improve Policy Reuse in a Reinforcement Learning Agent
An important drawback to the popular Belief, Desire, and Intentions (BDI) paradigm is that such systems include no element oflearning from experience. In particular, the so-called context conditions of plans, on which the whole model relies for plan selection, are restricted to be boolean formulas that are to be specified atdesign/implementation time. To address these limitations, we propose a novel BDI programming framework that, by suitably modeling context conditions as decision trees, allows agents to learn theprobability of success for plans based on previous execution experiences. By using a probabilistic plan selection function, the agentscan balance exploration and exploitation of their plans. We developand empirically investigate two extreme approaches to learning thenew context conditions and show that both can be advantageousin certain situations. Finally, we propose a generalization of theprobabilistic plan selection function that yields a middle-groundbetween the two extreme approaches, and which we thus argue isthe most flexible and simple approach. Learning Context Conditions for BDI Plan Selection
Since the introduction of the LRTA* algorithm, real-time heuristic search algorithms have generally followed the same plan-act-learn cycle: an agent plans one or several actions based on locallyavailable information, executes them and then updates (i.e., learns)its heuristic function. Algorithm evaluation has almost exclusivelybeen empirical with the results often being domain-specific andincomparable across papers. Even when unification and cross-algorithm comparisons have been carried out in a single paper,there was no understanding of how efficient the learning processwas with respect to a theoretical optimum. This paper addresses theproblem with two primary contributions. First, we formally definea lower bound on the amount of learning any heuristic-learning algorithm needs to do. This bound is based on the notion of heuristicdepressions and allows us to have a domain-independent measureof learning efficiency across different algorithms. Second, usingthis measure we propose to learn "costs-so-far" ($g$-costs) insteadof "costs-to-go" ($h$-costs). This allows us to quickly identify redundant paths and dead-end states, thereby leading to asymptoticperformance improvement as well as 1-2 orders of magnitude convergence speed-ups in practice. On Learning In Agent-Centered Search
In complex real-world environments, traditional (tabular)techniques for solving Reinforcement Learning (RL) do notscale. Function approximation is needed, but unfortunately,existing approaches generally have poor convergence and optimality guarantees. Additionally, for the case of humanenvironments, it is valuable to be able to leverage humaninput. In this paper we introduce Expanding Value Function Approximation (EVFA), a function approximation algorithm that returns the optimal value function given sufficient rounds. To leverage human input, we introduce a newhuman-agent interaction scheme, training regimens, whichallow humans to interact with and improve agent learning inthe setting of a machine learning game. In experiments, weshow EVFA compares favorably to standard value approximation approaches. We also show that training regimensenable humans to further improve EVFA performance. Inour user study, we find that non-experts are able to provideeffective regimens and that they found the game fun. Using Training Regimens to Teach Expanding Function Approximators
PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way whichguarantees that with high probability, the algorithm performs nearoptimally for all but a polynomial number of steps. The performance of these algorithms can be further improved by incorporating domain knowledge to guide their learning process. In this paper we propose a framework to use partial knowledge about effectsof actions in a theoretically well-founded way. Empirical evaluation shows that our proposed method is more efficient than reward shaping which represents an alternative approach to incorporate background knowledge. Our solution is also very competitivewhen compared with the Bayesian Exploration Bonus (BEB) algorithm. BEB is not PAC-MDP, however it can exploit domainknowledge via informative priors. We show how to use the samekind of knowledge in the PAC-MDP framework in a way whichpreserves all theoretical guarantees of PAC-MDP learning. PAC-MDP Learning with Knowledge-based Admissible Models
Aggregating the judgments of a group of agents regarding aset of interdependent propositions can lead to inconsistentoutcomes. One of the parameters involved is the agenda, theset of propositions on which agents are asked to express anopinion. We introduce the problem of checking the safety ofthe agenda: for a given agenda, can we guarantee that judgment aggregation will never produce an inconsistent outcome for any aggregation procedure satisfying a given setof axioms? We prove several characterisation results, establishing necessary and sufficient conditions for the safety ofthe agenda for different combinations of the most importantaxioms proposed in the literature, and we analyse the computational complexity of checking whether a given agendasatisfies these conditions. Complexity of Judgment Aggregation: Safety of the Agenda
We resolve an important open problem regarding the complexity of (constructive) unweighted coalitional manipulation problem in Copeland$^\alpha$ elections, that is, the complexity of Copeland$^\alpha$-manipulation for alpha in {0,1}. Copeland$^\alpha$, $0 \le \alpha \le 1$, is an election system where for each pair of candidates we check which one is preferred by more voters (i.e., we conduct a head-to-head majority contest) and we give one point to this candidate and zero to the other. However, in case of a tie both candidates receive alpha points. In the end, candidates with most points win. It is known that Copeland$^\alpha$-manipulation is NP-complete for all rational alpha's in [0,1]-{0.5} (i.e., for all the reasonable cases except the three truely interesting ones). In this paper we show that the problem remains NP-complete for $\alpha \in {0,1}$. In addition, we resolve the complexity of Copeland$^\alpha$-manipulation for each rational $\alpha \in [0,1]$ for the case of irrational voters. Manipulation of Copeland Elections
A voting rule is an algorithm for determining the winner in an election, and there are several approaches that have been used to justify the proposed rules. One justification is to show that a rule satisfies a set of desirable axioms that uniquely identify it. Another is to show that the calculation that it performs is actually maximum likelihood estimation relative to a certain model of noise that affects voters (MLE approach). The third approach, which has been recently actively investigated, is the so-called {\em distance rationalizability} framework. In it, a voting rule is defined via a class of consensus elections (i.e., a class of elections that have a clear winner) and a distance function. A candidate $c$ is a winner of an election $E$ if $c$ wins in one of the consensus elections that are closest to $E$ relative to the given distance. In this paper, we show that essentially any voting rule is distance-rationalizable if we do not restrict the two ingredients of the rule: the consensus class and the distance. Thus distance rationalizability of a rule does not by itself guarantee that the voting rule has any desirable properties. However, we demonstrate that the distance used to rationalize a given rule may provide useful information about this rule's behavior. Specifically,we identify a large class of distances, which we call {\em votewise} distances, and show that if a rule is rationalized via a distance from this class, many important properties of this rule can be easily expressed in terms of the underlying distance. This enables us to provide a new characterization of scoring rules and to establish a connection with the MLE framework. We alsogive bounds on the complexity of the winner determination problem for distance-rationalizable rules. On the Role of Distances in Defining Voting Rules
This paper deals with fair assignment problems in decisioncontexts involving multiple agents. In such problems, eachagent has its own evaluation of costs and we want to find afair compromise solution between individual point of views.Lorenz dominance is a standard decision model used in Economics to refine Pareto dominance while favoring solutionsthat fairly share happiness among agents. In order to enhance the discrimination possibilities offered by Lorenz dominance, we introduce here a new model called infinite orderLorenz dominance. We establish a representation result forthis model using an ordered weighted average with decreasing weights. Hence we exhibit some properties of infinite order Lorenz dominance that explain how fairness is achievedin the aggregation of individual preferences. Then we explain how to solve fair assignment problems of m items ton agents, using infinite order Lorenz dominance and othermodels used for measuring inequalities. We show that thisproblem can be reformulated as a 0-1 non-linear optimization problems that can be solved, after a linearization step,by standard LP solvers. We provide numerical results showing the efficiency of the proposed approach on various instances of the paper assignment problem. Infinite order Lorenz dominance for fair multiagent optimization
In many multiagent settings, situations arise in which agentsmust collectively make decisions while not every agent issupposed to have an equal amount of influence in the outcome of such a decision. Weighted voting games are oftenused to deal with these situations. The amount of influence that an agent has in a weighted voting game can bemeasured by means of various power indices.This paper studies the problem of finding a weighted voting game in which the distribution of the influence amongthe agents is as close as possible to a given target value.We propose a method to exactly solve this problem. Thismethod relies on a new efficient procedure for enumeratingweighted voting games of a fixed number of agents.The enumeration algorithm we propose works by exploiting the properties of a specific partial order over the class ofweighted voting games. The algorithm enumerates weightedvoting games of a fixed number of agents in time exponentialin the number of agents, and polynomial in the number ofgames output. As a consequence we obtain an exact anytimealgorithm for designing weighted voting games. Enumeration and exact design of weighted voting games
In this paper, we study a maximum likelihood estimation (MLE)approach to voting when the set of alternatives has a multi-issuestructure, and the voters’ preferences are represented by CP-nets.We first consider general multi-issue domains, and study whetherand how issue-by-issue voting rules and sequential voting rulescan be represented by MLEs. We first show that issue-by-issuevoting rules in which each local rule is itself an MLE (resp. acandidate scoring rule) can be represented by MLEs with a weak(resp. strong) decomposability property. Then, we prove two theorems that state that if the noise model satisfies a very weak decomposability property, then no sequential voting rule that satisfiesunanimity can be represented by an MLE, unless the number ofvoters is bounded.We then consider multi-issue domains in which each issue isbinary; for these, we propose a general family of distance-basednoise models, of which give an axiomatic characterization. Wethen propose a more specific family of natural distance-based noisemodels that are parameterized by a threshold. We identify the complexity of winner determination for the corresponding MLE votingrule in the two most important subcases of this framework. Aggregating Preferences in Multi-Issue Domains by Using Maximum Likelihood Estimators
The paper applies modal logic to formalize fragments of argumentation theory. Such formalization allows to import,for free, a wealth of new notions (e.g., argument equivalence), new techniques (e.g., calculi, model-checking games,bisimulation games), and results (e.g., completeness of calculi, adequacy of games, complexity of model-checking) fromlogic to argumentation. On the Logic of Argumentation Theory
A conflicting knowledge base can be seen abstractly as aset of arguments and a binary relation characterising conflict among them. There may be multiple plausible waysto evaluate conflicting arguments. In this paper, we ask:{\em given a set of agents, each with a legitimate subjective evaluation of a set of arguments, how can they reach a collectiveevaluation of those arguments?} After formally defining thisproblem, we extensively analyse an argument-wise pluralityvoting rule, showing that it suffers a fundamental limitation. Then we demonstrate, through a general impossibilityresult, that this limitation is more fundamentally rooted.Finally, we show how this impossibility result can be circumvented by additional domain restrictions. Collective Argument Evaluation as Judgement Aggregation
There is a number of recent research lines addressing complex negotiations in highly rugged utility spaces. However,most of them focus on overcoming the problems imposedby the complexity of the scenario, without analyzing thestrategic behavior of the agents in the models they propose. Analyzing the dynamics of the negotiation processwhen agents with different strategies interact is necessaryto apply these models to real, competitive environments,where agents cannot be supposed to behave in the same way.Specially problematic are situations like the well-known prisoner’s dilemma, or more generally, situations of high price ofanarchy. These situations imply that individual rationalitydrives the agents towards strategies which yield low individual and social welfares. In highly rugged scenarios, suchsituations usually make agents fail to reach an agreement,and therefore negotiation mechanisms should be designed toavoid them. This paper performs a strategy analysis of anauction-based negotiation model designed for highly ruggedscenarios, revealing that the approach is prone to the prisoner’s dilemma. In addition, a set of techniques to solvethis problem are proposed, and an experimental evaluationis performed to validate the adequacy of the proposed approaches to improve the strategic stability of the negotiationprocess. Avoiding the Prisoner's Dilemma in Auction-based Negotiations for Highly Rugged Utility Spaces
Successful interaction between autonomous agents is contingent on those agents making decisions consistent with theexpectations of their peers — these expectations are basedon their beliefs about the current state of the environmentin which interaction occurs. Contradictory beliefs lead tounintended and often unjustified outcomes. Given a sharedinteraction protocol to which all agents agree to adhere, it ispossible to identify the constraints upon which the outcomeof an interaction rests as it unfolds, and so prior to resolving those constraints, agents can compare and reconcile anyrelevant expectations by a process of argumentation.In this paper, we introduce a mechanism by which agentscan efficiently articulate their current beliefs in order to influence the resolution by their peers of constraints imposedon a distributed interaction and thus influence its outcome.We understand this as an opportunistic process of belief synchronisation within a restricted argument space, such thatall decisions can be said to be admissible given the information that it is practical for agents to share with one another.Thus, we use the distributed knowledge dispersed amongstan agent group to make better decisions in interaction without resorting to more complex, domain-specific protocols. Opportunistic Belief Reconciliation During Distributed Interactions
This paper presents an argumentative version of the wellknown alternating offers negotiation protocol. The negotiation mechanism is based on an abstract preference basedargumentation framework where both epistemic and practical arguments are taken into consideration in order to decide about different strategic issues. Such issues are the offerthat is proposed at each round, acceptance or refusal of anoffer, concession or withdrawal from the negotiation. The argumentation framework shows clearly how offers are linkedto practical arguments that support them, as well as howthe latter are influenced by epistemic arguments. Moreoverit illustrates how agents’ argumentative theories evolution,due to the exchange of arguments, influences the negotiation outcome. Finally, a generic algorithm that implementsa concession based negotiation strategy is presented. Argumentative Alternating Offers
Information Aggregation Markets (IAMs) constitute a mechanismwhose purpose is to collect and aggregate information. They arecommonly known as ‘prediction markets’ because they are oftenused to predict future events. In order for IAMs to efficientlyaggregate information, they should attract participants withdiversity of opinion, independence of thought anddecentralization of knowledge. When participants are typicallywell-informed, IAM prices will aggregate information into marketprices. IAMs can be considered as a large-scale, open, distributedsystem used by many human participants whose informationneeds to be aggregated. In this paper we propose a differentapproach for information aggregation using IAMs. We useautonomous, interacting agents instead of human participants andwe show that agent based IAMs can act as an informationaggregation mechanism able to perform predictions withouthuman involvement. Agent Based Information Aggregation Markets
This paper seeks to combine two largely independent threadsof multiagent systems research—agent specification and protocols. We specify agents in terms of goal models (as used inTropos). We specify protocols in terms of the commitmentsamong agents. We illustrate and formalize the semantic relationship between agents and protocols by exploiting the relationship between goals and commitments. Given an agentspecification and a protocol, the semantics helps us performtwo kinds of verification: (1) whether the protocol supportsachieving particular agent goals, and (2) whether the agent’sspecification supports the satisfaction of particular commitments. Reasoning about Agents and Protocols via Goals and Commitments
In many applications, tasks can be delegated to intelligent agents.In order to carry out a task, an agent should reason about what typesof resources the task requires. However, determining the right resource types requires extensive expertise and domain knowledge.In this paper, we propose means to automate the selection of resource types that are required to fulfill tasks. Our approach combines ontological reasoning and logic programming for a flexiblematchmaking of resources to tasks. Using the proposed approach,intelligent agents can autonomously reason about the resources andtasks in various real-life settings. Using a case-study, we describeand evaluate how agents can use the proposed approach to promoteresource sharing. Our evaluations show that the proposed approachis efficient and very useful for multi-agent systems. Flexible Task Resourcing for Intelligent Agents
We propose Alternating-time Dynamic Logic (ADL) as amulti-agent variant of Dynamic Logic in which atomic programs are replaced by coalitions. In ADL, the DynamicLogic operators are parametrised with regular expressionsover coalitions and tests. Such regular expressions describethe dynamic structure of a coalition. This means that, whenmoving from Dynamic Logic to ADL, the focus shifts awayfrom describing what is executed and when, toward describing who is acting and when. While Dynamic Logic providesfor reasoning about complex programs, ADL facilitates reasoning about coalitions with an inner dynamic structure, so-called coordinated coalitions. The semantics for such coalitions involves partial strategies and a variety of ways to combine them. Different combinations of partial strategies giverise to different semantics for ADL. In this paper, we mainlyfocus on one version of the semantics but we provide a discussion on other semantic variants of ADL together withpossible syntactic extensions. We see ADL to be suitablefor the specification and the verification of scheduling andplanning systems, and we therefore present a model checking algorithm for ADL and investigate its computationalcomplexity. Alternating-time Dynamic Logic
Many problems in AI and multi-agent systems research are mostnaturally formulated in terms of the abilities of a coalition of agents.There exist several excellent logical tools for reasoning about coalitional ability. However, coalitional ability can be affected by theavailability of resources, and there is no straightforward way ofreasoning about resource requirements in logics such as CoalitionLogic (CL) and Alternating-time Temporal Logic (ATL). In thispaper, we propose a logic for reasoning about coalitional abilityunder resource constraints. We extend ATL with costs of actionsand hence of strategies. We give a complete and sound axiomatisation of the resulting logic Resource-Bounded ATL (RB-ATL) andan efficient model-checking algorithm for it. Resource-bounded alternating-time temporal logic
We imagine high-level "planning" programs as programs built from achievement and maintenance goals. Their execution requires the ability to meet such goals while respecting program's control flow. The question then is: can we always guarantee the execution of such programs? In this paper, we define the novel planning-programming problem formally, and propose a technique to actually generate a solution by appealing to recent results in LTL-based synthesis of reactive systems. Agent Programming via Planning Programs
Agent composition is the problem of realizing a "virtual" agent bysuitably directing a community of available "concrete", i.e., alreadyimplemented, agents. It is a synthesis problem, since its solutionamounts into synthesizing a controller that suitably directs theavailable agents. Agent composition has its roots certain forms ofservice composition advocated for SOA, and it has has been recentlyactively studied by AI and Agents community.In this paper, we show that agent composition can be solved by ATL(Alternating-time Temporal Logic) model checking. This results is ofgreat interest for at least two contrasting reasons. First, from thepoint of view of agent composition, it gives access to some of themost modern model checking techniques and highly efficient tools, suchas MCMAS, that have been recently developed by the Agentcommunity. Second, from the point of view of ATL verification tools,it gives a novel concrete problem to look at, which puts emphasis onactually synthesize winning policies (the controller) instead of justchecking that they exists. Agent Composition Synthesis based on ATL
A way to design complex systems simulations is to build a society of interacting and co-evolving models. However, most of the time models and simulators already exist and come from different scientific fields. Beyond the technical issues to make different simulators cooperate, the challenge then is to make the co-evolution design and implementation easier for the users that rarely know intricate modelling and simulation tools. Agents and artefacts (A&A) paradigm simplifies the design and the implementation of a society of interacting and co-evolving models. That is, the addition, the removal or the exchange of models requires less effort. Contrary to classical approaches, we have built a decentralized co-evolution architecture based upon A\&A and a data-driven coordination model. In this article, beyond the architecture presentation, we focus on the benefit provided by A\&A used for multiple models co-evolution. AA4MM: Building complex system simulation as a set of interacting models
Experimental analysis of networks of cooperative learningagents (to verify certain properties such as the system’s stability) has been commonly used due to the complexity oftheoretical analysis in such cases. Due to the large numberof parameters to analyze, researchers used metrics that summarize the system in few parameters. Since in cooperativesystem the ultimate goal is to optimize some global metric,researchers typically analyzed the evolution of the globalperformance metric over time to verify system properties.For example, if the global metric improves and eventuallystabilizes, it is considered a reasonable verification of thesystem’s stability.The global performance metric, however, overlooks an important aspect of the system: the network structure. Weshow an experimental case study where the convergence ofthe global performance metric is deceiving, hiding an underlying instability in the system that later leads to a significantdrop in performance. To expose such instability, we propose the use of the graph analysis methodology, where thenetwork structure is summarized using some network measures. We develop a new network measure that summarizesan agent’s interaction with its neighbors and takes the disparity of these interactions into account. The new measureis applied to our case study, clearly exposing the instability that was previously hidden by the global performancemetric. Using Graph Analysis to Study Networks of Adaptive Agent
An important research topic within Environmental Criminology isthe analysis of the spatio-temporal dynamics of crime. Some ofthe main challenges in this area are the prediction and preventionof criminal hot spots. This paper presents an agent-basedframework that is able to address such challenges. The frameworkexploits simulation techniques to compare different strategies forguardian movement in terms of their efficiency (low costs) andeffectiveness (high prevention rate). In addition, by automatedchecks, more detailed properties of the different strategies can bestudied. As a result, the framework can be used as a tool to assistresearchers in their theory building, and potentially also policymakers in their decision making. To illustrate the approach, anumber of strategies for guardian movement are compared, andthe results are discussed. An Agent-Based Framework to Support Crime Prevention
It is to date an open question to know how the updatingmethods affect the evolution of a multi-agent system. Thisquestion has been tackled for various complex systems suchas cellular automata, Boolean networks, neural networksbut little is known for multi-agent systems, especially forthe models with a complex behaviour which emerges fromsimple local rules. This paper focuses on a multi-turmitemodel, namely the multiple Langton’s ants model. All theagents are updated simultaneously and the variation of theupdating scheme consists only in choosing different strategies for solving the conflicts produced when two or moreagents want to go on the same location. We show that forthe same formulation of the agents’ behaviour, and the sameinitial conditions, the use of different updating schemes maylead to qualitatively different evolutions of the system. Asa positive spin-off of this study, we exhibit new phenomenaof the multi-turmite model such as deadlocks or gliders. How important are updating schemes in multi-agent systems? An illustration on a multi-turmite model
Agent-based simulations are an increasingly popular meansof exploring and understanding complex social systems. Inorder to be useful, these simulations must capture a range ofaspects of the modeled situation, each possibly requiring distinct expertise. Moreover, different paradigms may be usefulin modelling, ranging from those that use many lightweightreactive agents, to those that use cognitive agents, to thosethat focus on agent teams and organisational structures.There is need for an architecture which supports the development of a large simulation, through the integrationof separately developed modules. This paper describes aframework and architecture which facilitates the integrationof multiple agent-based simulations into a single globalsimulation. This architecture naturally supports distributedsimulation and incremental development, which are waysof addressing the computational and conceptual complexityof such systems. In this paper we focus particularly onhow to ensure proper management of simulation data thatis affected by agents in different modules, at the samelogical time. We also provide some preliminary performanceevaluation addressing scalability, as well as a comparison ofhow other available systems handle the issue of shared data. An Architecture for Modular Distributed Simulation with Agent-Based Models
Agents in a multi-agent system do not act in a vacuum. Theoutcome of their efforts depends on the environment in whichthey seek to act, and in particular on the efforts of other agentswith whom they share the environment. We review previousefforts to address this problem, including active environments,concurrency modeling, recursive reasoning, and stochasticprocesses. Then we propose an approach that combines activeenvironments and stochastic processes while addressing theirlimitations: a swarming agent simulation (which maintainstransition probabilities dynamically, avoiding the staticassumptions most convenient with traditional Markov models),applied concurrently to multiple perspectives (thus partitioningthe active environment and addressing its scalability challenges).We demonstrate this method on a simple example. Agent Interaction, Multiple Perspectives, and Swarming Simulation
Appearance-based localization compares the current image takenfrom a robot’s camera to a set of pre-recorded images in order toestimate the current location of the robot. Such techniques oftenmaintain a graph of images, modeling the dynamics of the imagesequence. This graph is used to navigate in the space of images.In this paper we bring a set of techniques together, includingPartially-Observable Markov Decision Processes, hierarchical staterepresentations, visual homing, human-robot interactions, and soforth, into the appearance-based approach. Our approach providesa complete solution to the deployment of a robot in a relativelysmall environment, such as a house, or a work place, allowing therobot to robustly navigate the environment after minimal training.We demonstrate our approach in two environments using a realrobot, showing how after a short training session, the robot is ableto navigate well in the environment. Augmenting Appearance-Based Localization and Navigation using Belief Update
Ant robots have very low computational power and limited memory. Theycommunicate by leaving pheromones in the environment. In order to createa cooperative intelligent behavior, ants may need to get together; however,they may not know the locations of other ants. Hence, we focus on an antvariant of the rendezvous problem, in which two ants are to be brought tothe same location in finite time. We introduce two algorithms that solvethis problem for two ants by simulating a bidirectional search in differentenvironment settings. An algorithm for an environment with no obstaclesand a general algorithm that handles all types of obstacles. We providedetailed discussion on the different attributes, size of pheromone required,and the performance of these algorithms. Ants Meeting Algorithms
In this paper, we present a new trajectory planning algorithm for virtual humans. Our approach focuses on implicitcooperation between multiple virtual agents in order to sharethe work of avoiding collisions with each other. Specifically,we extend recent work on multi-robot planning to bettermodel how humans avoid collisions by introducing new parameters that model human traits, such as reaction timeand biomechanical limitations. We validate this new modelbased on data of real humans walking captured by the Locanthrope project. We also show how our model extendsto complex scenarios with multiple agents interacting witheach other and avoiding nearby obstacles. Modeling Collision Avoidance Behavior for Virtual Humans
We present a formal framework of an autonomous agent asa collection of coordinated control loops, with a recurringsense, plan, act cycle. Our framework manages the information flow within the partitioned structure to ensure consistency in order to direct the flow of goals and observations ina timely manner. The resulting control structure improvesscalability since many details of each controller can be encapsulated within a single control loop. This partitioned agentdesign promises a domain-independent, scalable and robustapproach for control of real-world autonomous robots operating in dynamic environments. We validate our frameworkwith experimental results from deployments in two differentreal-world domains. A Systematic Agent Framework for Situated Autonomous Systems
The problem of multi-robot patrol in adversarial environments has been gaining considerable interest during the recent years.In this problem, a team of mobile robots is required to repeatedly visit some target area in order to detectpenetrations that are controlled by an adversary. Little hasbeen written so far on the nature of the event of penetration,and it is commonly assumed that the goal of the robots is todetect the penetration at any time during its occurrence.Inthis paper we offer a new definition of an event, with correlation to a utility function such that the detection of the eventby the robots in different stages of its occurrence grants therobots a different reward. The goal of the robots is, then, tomaximize their utility from detecting the event.We provide three different models of events, for which wedescribe algorithms for calculating the expected utility fromdetecting the event and discuss the how the model influencesthe optimality of the patrol algorithm. In the first and basicmodel, we assume that there exists a reward function suchthat detecting an event at different times grants the robotswith an associated reward. In the second model, the eventmight evolve during its occurrence, and this progression correlates to both different rewards and to growing probabilityof detection. Finally, we consider a general model, in whichthe event can be detected from distance, where the probability of detection depends both on the distance from therobot and on the current state of the event. Last, we discuss how the new event models presented in this paper setgrounds for handling the problem of patrol in heterogeneousenvironments, where parts of the perimeter could be moresensitive to occurrence of events. On Events in Multi-Robot Patrol in Adversarial Environments
We introduce a novel case study in which a group of miniaturized robots screen an environment for undesirable agents,and destroy them. Because miniaturized robots are usuallyendowed with reactive controllers and minimalist sensingand actuation capabilities, they must collaborate in order toachieve their task efficiently. In this paper, we show how aggregation can mediate both collective perception and actionwhile maintaining the scalability of the algorithm. First,we demonstrate the feasibility of our approach by implementing it on a real group of Alice mobile robots, which areonly two centimeters in size. Then, we use a combinationof both realistic simulations and macroscopic models in order to find optimal parameters that maximize the number ofundesirable cells destroyed while minimizing the impact onthe healthy population. Finally, we discuss the limitationsof these models, both in terms of accuracy, computationalcost, and scalability, and we outline the importance of anappropriate multi-level modeling methodology to ensure therelevance and the faithfulness of such models. Aggregation-mediated Collective Perception and Action in a Group of Miniature Robots
In the strategyproof classification setting, a set of labeledexamples is partitioned among multiple agents. Given thereported labels, an optimal classification mechanism returnsa classifier that minimizes the number of mislabeled examples. However, each agent is interested in the accuracy ofthe returned classifier on its own examples, and may misreport its labels in order to achieve a better classifier, thuscontaminating the dataset. The goal is to design strategyproof mechanisms that correctly label as many examplesas possible.Previous work has investigated the foregoing setting under limiting assumptions, or with respect to very restrictedclasses of classifiers. In this paper, we study the strategyproof classification setting with respect to prominent classesof classifiers—boolean conjunctions and linear separators — and without any assumptions on the input. On the negativeside, we show that strategyproof mechanisms cannot achievea constant approximation ratio, by showing that such mechanisms must be dictatorial on a subdomain, in the sensethat the outcome is selected according to the preferences ofa single agent. On the positive side, we present a randomizedmechanism—Iterative Random Dictator—and demonstrateboth that it is strategyproof and that its approximation ratiodoes not increase with the number of agents. Interestingly,the notion of dictatorship is prominently featured in all ourresults, helping to establish both upper and lower bounds. On the Limits of Dictatorial Classification
We consider collusion in multi-unit auctions where the allocation and payments are determined using the VCG mechanism. We show how collusion can increase the utility of thecolluders, characterize the optimal collusion and show it caneasily be computed in polynomial time. We then analyzethe colluders’ coalition from a cooperative game theoreticperspective. We show that the collusion game is a convexgame, so it always has a non-empty core, which containsthe Shapley value. We show how to find core imputationsand compute the Shapley value, and thus show that in thissetting the colluders can always share the gain from theirmanipulation in a stable and fair way. This shows that thisdomain is extremely vulnerable to collusion. Honor Among Thieves - Collusion In Multi-Unit Auctions
We explore settings where a principal must make a decision aboutwhich action to take to achieve a desired outcome. The principal elicits the probability of achieving the outcome by followingeach action from a self-interested (but decision-agnostic) expert.We prove results about the relation between the principal’s decision rule and the scoring rules that determine the expert’s payoffs.For the most natural decision rule (where the principal takes the action with highest success probability), we prove that no symmetricscoring rule, nor any of Winkler’s asymmetric scoring rules, havedesirable incentive properties. We characterize the set of differentiable scoring rules with desirable incentive properties and construct one. We then consider decision markets, where the role of asingle expert is replaced by multiple agents that interact by tradingwith an automated market maker. We prove a surprising impossibility for this setting: an agent can always benefit from exaggeratingthe success probability of a suboptimal action. To mitigate this, weconstruct automated market makers that minimize manipulability.Finally, we consider two alternative decision market designs. Weprove the first, in which all outcomes live in the same probabilityuniverse, has poor incentive properties. The second, in which theexperts trade in the probability of the outcome occurring unconditionally, exhibits a new kind of no-trade phenomenon. Decision Rules and Decision Markets
This paper analyzes the worst-case efficiency ratio of false-name-proofcombinatorial auction mechanisms. False-name-proofness generalizesstrategy-proofness, by assuming that a bidder can submit multiple bidsunder fictitious identifiers. Even the well-known Vickrey-Clarke-Grovesmechanism is not false-name-proof. It has previously been shown thatthere is no false-name-proof mechanism that always achieves a Paretoefficient allocation. Hence, if false-name bids are possible,we need to sacrifice efficiency to some extent.This leaves the natural question of how much surplusmust be sacrificed. To answer this question,this paper focuses on worst-case analysis.Specifically, we consider thefraction of the Pareto-efficient surplus that we obtain, and try tomaximize this fraction in the worst case, under the constraint offalse-name-proofness.As far as we are aware, this is the first attempt to examine theworst-case efficiency of false-name-proof mechanisms. We show that the worst-case efficiency ratio of any false-name-proofmechanism that satisfies some apparently minor assumptions is atmost $2/(m+1)$ for auctions with $m$ different goods.We also observe that the worst-case efficiency ratioof existing false-name-proof mechanisms is generally $1/m$ or 0.Finally, we propose a novel mechanism,called the adaptive reserve price mechanism, which is false-name-proof when all bidders are single-minded,and its worst-case efficiency ratio is $2/(m+1)$, i.e., optimal. Worst-case efficiency ratio in false-name-proof combinatorial auction mechanisms
We study the complexity of social welfare optimization in multiagent resource allocation. We assume resources to be indivisibleand nonshareable and agents to express their utilities over bundlesof resources, where utilities can be represented in either the bundleform or the k-additive form. Solving some of the open problemsraised by Chevaleyre et al. and confirming their conjectures, weprove that egalitarian social welfare optimization is NP-completefor both the bundle and the 1-additive form, and both exactutilitarian and exact egalitarian social welfare optimization areDP-complete, each for both the bundle and the 2-additive form,where DP is the second level of the boolean hierarchy over NP. Inaddition, we prove that social welfare optimization with respectto the Nash product is NP-complete for both the bundle and the1-additive form. Finally, we briefly discuss hardness of socialwelfare optimization in terms of inapproximability. Complexity of Social Welfare Optimization in Multiagent Resource Allocation
In this paper, we study a service procurement problem with uncertainty as to whether service providers are capable of completing agiven task within a specified deadline. This type of setting is often encountered in large and dynamic multi-agent systems, such ascomputational Grids or clouds. To effectively deal with this uncertainty, the consumer may dynamically and redundantly procuremultiple services over time, in order to increase the probability ofsuccess, while at the same time balancing this with the additionalprocurement costs. However, in order to do this optimally, the consumer requires information about the providers’ costs and their success probabilities over time. This information is typically held privately by the providers and they may have incentives to misreportthis, so as to increase their own profits. To address this problem, weintroduce a novel mechanism that incentivises self-interested providers to reveal their true costs and capabilities, and we show thatthis mechanism is ex-post incentive compatible, efficient and individually rational. However, for these properties to hold, it generallyneeds to compute the optimal solution, which can be intractable inlarge settings. Therefore, we show how we can generate approximate solutions while maintaining the economic properties of themechanism. This approximation admits a polynomial-time solution that can be computed in seconds even for hundreds of providers, and we demonstrate empirically that it performs as well as theoptimal in typical scenarios. In particularly challenging settings,we show that it still achieves 97\% or more of the optimal. Scalable Mechanism Design for the Procurement of Services with Uncertain Durations
We investigate partial order reduction for model checking multi-agent systems by focusing on interleaved interpreted systems. These are a particular class of interpreted systems, a mainstream MAS formalism, in which only one action at the time is performed in the system. We present a notion of stuttering-equivalence, and prove the semantical equivalence of stuttering-equivalent traces with respect to linear and branching time temporal logics for knowledge without the next operator. We give algorithms to reduce the size of the models before the model checking step and show preservation properties. We evaluate the technique by discussing implementations and the experimental results obtained against well-known examples in the MAS literature. Partial order reductions for model checking temporal epistemic logics over interleaved mulit-agent sy
Social laws have proved to be a powerful and theoretically elegantframework for coordination in multi-agent systems. Most existingmodels of social laws assume that a designer is attempting to produce a set of constraints on agent behaviour which will ensure thatsome single overall desirable objective is achieved. However, thisrepresents a gross simplification of the typical situation, where adesigner may have multiple (possibly conflicting) objectives, withdifferent priorities. Moreover, social laws, as well as bringing benefits, also have implementation costs: imposing a social law oftencannot be done at zero cost. We present a model of social lawsthat reflects this reality: it takes into account both the fact that thedesigner of a social law may have multiple differently valued objectives, and that the implementation of a social law is not cost-neutral. In this setting, designing a social law becomes an optimisation problem, in which a designer must take into account boththe benefits and costs of a social law. We investigate the issue ofrepresenting a designer’s objectives, characterise the complexity ofthe optimal social law design problem, and consider possible constraints that lead to reductions in computational complexity. Wethen show how the problem of designing an optimal social law canbe formulated as an integer linear program. Optimal Social Laws
We present a method of distributed model checking of multi-agent systems specified by a branching-time temporal-epistemiclogic. We introduce a serial algorithm, central to the distributed approach, for combining binary decision diagramswith bounded model checking. The algorithm is based ona notion of “seed states” to allow for state-space partitioning. Exploring individual partitions displays benefits arisingfrom the verification of partial state-spaces. When verifyingboth an industrial model and a scalable benchmark scenario the serial bounded technique was found to be effective.Results for the distributed technique demonstrate that itout-performs the sequential approach for falsifiable formulae.Experimental data indicates that increasing the number ofhosts improves verification efficiency. Distributed BDD-based BMC for the Verification of Multi-Agent Systems
We present the logic CTL.STIT, which is the join of the logic CTLwith a multi-agent strategic stit-logic variant. CTL.STIT subsumesATL, and adds expressivity to it that we claim is very importantfor using the formalism for multi-agent system verification. We argue extensively that the extra expressivity is important for extending ATL to a language for reasoning about norms, strategic games,knowledge games, conditional strategies, etc. Also we compare thelogic’s suitability for multi-agent system verification with verification formalisms based on dynamic logic. We will give a number ofarguments in favor of stit-formalisms. But, which paradigm to useultimately depends on what kind of properties we want to verify. CTL.STIT: enhancing ATL to express important multi-agent system verification properties
Information security is vital to many multiagent system applications. In this paper we formalise the notion of detectability of attacks in a MAS setting and analyse its applicability. We introduce a taxonomy of detectability specifications expressed in temporal-epistemic logic. We illustratethe practical relevance of attack detectability in a case studyapplied to a variant of Kerberos protocol. We model-checkattack detectability in automatically generated MAS modelsfor security protocols. Model checking detectability of attacks in multiagent systems
ATL$^+$ is a variant of alternating-time temporal logic that does not have the expressive power of full ATL$^*$,but still allows for expressing some natural properties of agents. It has been believed that verification with ATL$^+$ is $\Delta^P_3$-completefor both memoryless agents and players who can memorize the whole history of the game. In this paper, we show that the latterresult is not correct. That is, we prove that model checking ATL$^+$ for agents that use strategies with memory is in fact PSPACE-complete.On a moreoptimistic note, we show that fairness constraints can be added to ATL$^+$ without further increasing the complexity of model checking, which makes ATL$^+$ an attractive alternative to the full language of ATL$^*$. Verifying Agents with Memory Is Harder than It Seemed
We address the problem of single-agent, autonomous sequentialdecision making. We assume that some controllers or behaviorpolicies are given as prior knowledge, and the task of the agentis to learn how to switch between these policies. We formulate theproblem using the framework of reinforcement learning and options (Sutton, Precup & Singh, 1999; Precup, 2000). We derivegradient-based algorithms for learning the termination conditionsof options, with the goal of optimizing the expected long-term return. We incorporate the proposed approach into policy-gradientmethods with linear function approximation. Optimal Policy Switching Algorithms for Reinforcement Learning
This paper describes an algorithm, called CQ-learning, whichlearns to adapt the state representation for multi-agent systems inorder to coordinate with other agents. We propose a multi-levelapproach which builds a progressively more advanced representation of the learning problem. The idea is that agents start with aminimal single agent state space representation, which is expandedonly when necessary. In cases where agents detect conflicts, theyautomatically expand their state to explicitly take into account theother agents. These conflict situations are then analyzed in an attempt to find an abstract representation which generalises over theproblem states. Our system allows agents to learn effective policies, while avoiding the exponential state space growth typical inmulti-agent environments. Furthermore, the method we introduceto generalise over conflict states allows knowledge to be transferredto unseen and possibly more complex situations. Our research departs from previous efforts in this area of multi-agent learning because our agents combine state space generalisation with an agent-centric point of view. The algorithms that we introduce can beused in robotic systems to automatically reduce the sensor information to what is essential to solve the problem at hand. This isa must when dealing with multiple agents, since learning in suchenvironments is a cumbersome task due to the massive amount ofinformation, much of which may be irrelevant. In our experimentswe demonstrate a simulation of such environments using variousgridworlds. Learning multi-agent state space representations
Agent programming in complex, partially observable andstochastic domains usually requires a great deal of understanding of both the domain and the task, in order to provide the agent with the knowledge necessary to act effectively. While symbolic methods allow the designer to specify declarative knowledge about the domain, the resultingplan can be brittle since it is difficult to supply a symbolicmodel that is accurate enough to foresee all possible eventsin complex environments, especially in the case of partialobservability. Reinforcement Learning (RL) techniques, onthe other hand, can learn a policy and make use of a learnedmodel, but it is difficult to reduce and shape the scope of thelearning algorithm by exploiting a priori information. Wepropose a methodology for writing complex agent programsthat can be effectively improved through experience. Weshow how to derive a stochastic process from a partial specification of the plan, so that the latter’s perfomance can beimproved solving a RL problem much smaller than classicalRL formulations. Finally, we demonstrate our approach inthe context of Keepaway Soccer, a common RL benchmarkbased on a RoboCup Soccer 2D simulator. Improving the Performance of Complex Agent Plans Through Reinforcement Learning
A major challenge for traditional approaches to multiagentlearning is to train teams that easily scale to include additional agents. The problem is that such approaches typically encode each agent’s policy separately. Such separationmeans that computational complexity explodes as the number of agents in the team increases, and also leads to theproblem of reinvention: Skills that should be shared amongagents must be rediscovered separately for each agent. Toaddress this problem, this paper presents an alternative evolutionary approach to multiagent learning called multiagentHyperNEAT that encodes the team as a pattern of relatedpolicies rather than as a set of individual agents. To capturethis pattern, a policy geometry is introduced to describe therelationship between each agent’s policy and its canonicalgeometric position within the team. Because policy geometry can encode variations of a shared skill across all of thepolicies it represents, the problem of reinvention is avoided.Furthermore, because the policy geometry of a particularteam can be sampled at any resolution, it acts as a heuristicfor generating policies for teams of any size, producing apowerful new capability for multiagent learning. In this paper, multiagent HyperNEAT is tested in predator-prey androom-clearing domains. In both domains the results are effective teams that can be successfully scaled to larger teamsizes without any further training. Evolving Policy Geometry for Scalable Multiagent Learning
Decentralized reinforcement learning (DRL) has been applied to a number of distributed applications. However, oneof the main challenges faced by DRL is its convergence. Previous work has shown that hierarchically organizational control is an effective way of coordinating DRL to improve itsspeed, quality, and likelihood of convergence. In this paper, we develop a distributed, negotiation-based approach todynamically forming such hierarchical organizations. To reduce the complexity of coordinating DRL, our self-organizationapproach groups strongly-interacting learning agents together,whose exploration strategies are coordinated by one supervisor. We formalize this idea by characterizing interactionsamong agents in a decentralized Markov Decision Processmodel and defining and analyzing a measure that explicitlycaptures the strength of such interactions. Experimental results show that our dynamically evolving organizations outperform predefined organizations for coordinating DRL. Self-Organization for Coordinating Decentralized Reinforcement Learning
Much past work on solving Markov decision processes (MDPs) using reinforcement learning (RL) has relied on combining parameter estimation methods with hand-designed function approximationarchitectures for representing value functions. Recently, there hasbeen growing interest in a broader framework that combines representation discovery and control learning, where value functions areapproximated using a linear combination of task-dependent basisfunctions learned during the course of solving a particular MDP.This paper introduces an approach to automatic basis function construction for hierarchical reinforcement learning (HRL). Our approach generalizes past work on basis construction to multi-levelaction hierarchies by forming a compressed representation of asemi-Markov decision process (SMDP) at multiple levels of temporal abstraction. The specific approach is based on hierarchicalspectral analysis of graphs induced on an SMDP’s state space fromsample trajectories. We present experimental results on benchmarkSMDPs, showing significant speedups when compared to hand-designed approximation architectures. Basis Function Construction for Hierarchical Reinforcement Learning
Coalitions and cooperation are key topics in multi–agent systems (MAS). They enable agents to achieve goals that theymay not have been able to achieve independently. A rangeof previous studies have found that many problems in coalitional games tend to be computationally intractable - thatis, the computational complexity grows rapidly as a functionof the number of participating agents. However, these hardness results generally require that each agent is of a differenttype. Here, we observe that in many mas settings, while thenumber of agents may grow, the number of different types ofagents remains small. We formally define the notion of agenttypes in cooperative games. We then re-examine the computational complexity of the different coalition formationproblems when assuming that the number of agent typesis fixed. We show that most of the previously hard problems become polynomial when the number of agent typesis fixed. We consider multiple different game formulationsand representations (characteristic function with subadditive utilities, crg, and graphical representations) and several different computational problems (including stability,core-emptiness, and Shapley value). On Agent Types in Coalition Formation Problems
Autonomous agents transcend their individual capabilities by cooperating towards achieving shared goals. The different viewpoints agents have on the environment cause disagreements about the anticipated effects of plans. Reaching agreement requires the resolution of such inconsistencies and the alignment of the agents' viewpoints. We present a dialogue protocol that enables agents to discuss candidate plans and reach agreements. The dialogue is based on an argumentation process in the language of situation calculus. Agreement is reached through persuasion, thereby aligning the planning beliefs of the agents. We describe our abstract iterated dialogue protocol, and extend it for the specific problem of arguing about plans. We show that our method always terminates and produces sound results. Furthermore, we detail a set of heuristics to simplify reasoning and reduce the exchanged information. Agreeing on Plans Through Iterated Disputes
The multi-agent VRP solver is presented in this paper. Itutilizes the contract-net-protocol based allocation and several improvement strategies. It provides the solution withthe success ratio of 81\% compared to the optimal solutionon 115 benchmark instances in polynomial time. The self organization ability of the system successfully minimizes thenumber of vehicles used. The presented solver architecture supports great runtime parallelization with incrementalincrease of solution quality. The presented solver demonstrates applicability to the VRP problem and easy adaptation to problem variants. Agents Towards Vehicle Routing Problems
In this paper, we extend the traditional formalization ofa Distributed Constraint Satisfaction Problems (DisCSP)to a Quantified DisCSP.A Quantified DisCSP includes several universally quantifiedvariables, while all of the variables in a traditional DisCSP areexistentially quantified.A universally quantified variable represents a choice of the nature oran adversary.A Quantified DisCSP formalizes a situation where a team of agents istrying to make a robust plan against the nature or an adversary.In this paper, we present the formalization of such aQuantified DisCSP and develop an algorithm for solving it.This algorithm generalizes the asynchronous backtracking algorithmused for solving a DisCSP.In this algorithm, agents communicate a value assignment calleda good in addition to the nogood used inasynchronous backtracking.Interestingly, the procedures executed by an adversarial/cooperative agent for good/nogood are totally symmetrical.Furthermore, we develop a method for improving this basic algorithm. Experimental evaluation results illustrate that we observe an easy-hard-easy transition by changing the tightness of constraints, while very loose problem instances are relatively hard.Also, the modification of the basic algorithm is effective and can achieve about 25\% reduction of cycles for the hardest problem instances. Cooperative Problem Solving against Adversary:Quantified Distributed Constraint Satisfaction Problem
When agents need to interact in order to solve some (possibly common) problem, resolving potential conflicts beforehand is often preferred to coordination during execution.Agents may lose some flexibility, but their course of actionwill be more predictable and often also more efficient, obtaining a socially optimal outcome instead of a local optimum. One way to resolve conflicts beforehand is to give extra constraints to each of the agents such that when they allmeet these constraints, the resulting execution is conflict-free. A set of constraints that meets this requirement iscalled a decoupling of the original problem; if it also maximizes the social welfare (i.e. the sum of the valuations ofall the agents), it is called optimal. Representing interestingmultiagent problems as a constraint problem, we show thatfinding an optimal decoupling is at least as hard as finding a solution for the constraint problem. We therefore focus on a constraint problem that is efficiently solvable, butstill very relevant and interesting in the context of multiple agents executing their actions, i.e. the Simple TemporalProblem (STP). Two more technical results, then, are thatwe resolve the open question whether finding an optimal decoupling of the STP is NP-hard (it is), and if all agents havelinear valuation functions, this decoupling problem can besolved efficiently. Optimal Temporal Decoupling in Multiagent Systems
The Art Gallery Problem asks to find a minimum subset of vertices in a polygon that are sufficient to observe the interior. This problem arises in a variety of multiagent systems, including robotics, sensor networks, wireless networking, and surveillance. Despite the fact that the centralized version of the problem has been extensively studied for the past thirty years, there is relatively little in the literature describing distributed solutions to the problem that have desirable guarantees in both runtime and optimality. We propose and analyze a new distributed algorithm for approximating a solution to this problem and a number of its variants that runs in a linear number of communication rounds with respect to the number of nodes (independent of the topology of the network), and, under assumptions on the embedding of the edge weights, will run in a logarithmic number of communication rounds producing solutions within a constant factor of optimal. Dominating Sets of Agents in Visibility Graphs: Distributed Algorithms for Art Gallery Problems
In systems based on organisational specifications a reoccurring problem remains to be solved in the disparity between the level of abstractness of the organisational concepts and the concepts used in the implementation. Organisational specifications (deliberately) abstract from generalpractice, which creates a need to relate the abstract conceptsused in the specification to concrete ones used in practice.A solution for this problem is the use of counts-as statements, which, by defining the social reality, provide the concrete concepts their institutional and organisational meaning. Continuing work on the implementation of counts-asto relate abstract and concrete concepts in agent-based systems, this paper investigates the implementation of counts-as statements in Drools to relate abstract organisationalspecifications and its norms to concrete situations. Making Norms Concrete
A power describes the ability of an agent to act in some way. Whilethis notion of power is critical in the context of organisational dynamics, and has been studied by others in this light, it must beconstrained so as to be useful in any practical application. In particular, we are concerned with how power may be used by agents togovern the imposition and management of norms, and how agentsmay dynamically assign norms to other agents within a multi-agentsystem. We approach the problem by defining a syntax and semantics for powers governing the creation, deletion, or modification ofnorms within a system, which we refer to as normative powers. Wethen extend this basic model to accommodate more general powers that can modify other powers within the system, and describehow agents playing certain roles are able to apply powers, changingthe system’s norms, and also the powers themselves. We examinehow the powers found within a system may change as the statusof norms change, and show how standard norm modification operations — such as the derogation, annulment and modification ofnorms — may be represented within our system. A Model of Normative Power
An organizational modeling language can be used to specify anagent organization in terms of its roles, organizational structure,norms, etc. Such an organizational specification imposes constraintson agents that play roles in it, and the agents are expected to takethis into account when deciding what to do. This means that agentsneed to have a basic understanding of what it means to complywith organizational constraints. For this, it is essential that theseconstraints are precisely specified. In this paper, we address thisin the context of the MOISE+ organizational modeling language.We define a semantic framework for MOISE+ MAS and an accompanying linear temporal logic (LTL) to express its properties. Weanalyze which constraints MOISE+ imposes on agents, and investigate how these can be made precise in LTL. We show that multipleinterpretations of constraints are sometimes possible, and explorethe space of possibilities. These analyses demonstrate the need fora rigorous specification of organizational constraints, and providethe foundations for the development of agents that understand howto function in a MOISE+ MAS. Formalizing Organizational Constraints: A Semantic Approach
Social norms enable coordination in multiagent systems byconstraining agent behaviour in order to achieve a social objective. Automating the design of social norms has beenshown to be NP-complete, requiring a complete state enumeration. A planning-based solution has been proposedpreviously to improve performance. This approach leadsto verbose, problem-specific norms due to the propositionalrepresentation of the domain. We present a first-order extension of this work that benefits from state and operatorabstractions to synthesise more expressive, generally applicable norms. We propose optimisations that can be used toreduce the search performed during synthesis, and formallyprove the correctness of these optimisations. Finally, we empirically illustrate the benefits of these optimisations in anexample domain. Exploiting Domain Knowledge to Improve Norm Synthesis
Requirements Driven Agent Collaboration (RDAC) is a mechanism where the self-interested service agents actively andautonomously search for the required services submitted bythe request agents and compete to offer the services. Thismechanism would be more suitable for large number of autonomous service providers on internet compared with thecurrent service-oriented computing framework. Collaboration is an important issue in this mechanism.This paper focuses on the collaboration issue in RDAC.First, we define the assignment problem in RDAC and showthat it is NP-complete. Then, we model it as a Kripke structure with normative systems on it. This makes it possibleto bridge the assignment problem and the existing effortsin normative systems, games, mechanisms, etc. Thirdly, anegotiation-based approach is given to solve the problemand the performance of the negotiation is evaluated by simulation. Finally, a tool for RDAC has been implemented toput it into practice and for further testing and evaluation. Assignment Problem in Requirements Driven Agent Collaboration and its Implementation
We introduce a game setting called a joint process, where the history of actions determine the state, and the state and agent properties determine the payoff. This setting is a special case of stochastic games and is a natural model for situations with alternating control. Joint process games have applications as diverse as aggregate rating sites and wiki page updates. These games are related to Black's median voter theorem and also strongly connected to Moulin's strategy-proof voting schemes. When each agent has a personal goal, we look at how the play converges under a simple myopic action rule, and prove that not only do these simple dynamics converge, but the actions selected also form a Nash equilibrium. The convergence point is not the mean or the median of the set of agent goals; instead we prove the convergence point is the median of the set of agent goals and a set of focal points. This work provides the first theoretical model of wiki-type behavior and opens the door to more questions about the properties of these games. Joint Process Games: From Ratings to Wikis
In this paper, we propose a novel general framework foranalysing competing double auction markets that vie fortraders, who then need to choose which market to go to.Based on this framework, we analyse the competition between two markets in detail. Specifically, we game-theoretically analyse the equilibrium behaviour of traders’ marketselection strategies and adopt evolutionary game theory toinvestigate how traders dynamically change their strategies,and thus, which equilibrium, if any, can be reached. In sodoing, we show that it is unlikely for these competing markets to coexist. Eventually, all traders will always convergeto locating themselves at one of the markets. Somewhatsurprisingly, we find that sometimes all traders converge tothe market that charges higher fees. Thus we further analyse this phenomenon, and specifically determine the factorsthat affect such migration. A Game-Theoretic Analysis of Market Selection Strategies for Competing Double Auction Marketplaces
We consider (prediction) markets where myopic agents sequentially interact with an automated market maker. Weshow a broad negative result: by varying the order of participation, the market’s aggregate prediction can convergeto an arbitrary value. In other words, markets may fail todo any meaningful belief aggregation. On the positive side,we show that under a random participation model, steadystate prices equal those of the traditional static predictionmarket model. We discuss applications of our results to thefailure of the 1996 Iowa Electronic Market. When Do Markets with Simple Agents Fail?
In the course allocation problem, a university administrator seeksto efficiently and fairly allocate schedules of over-demanded coursesto students with heterogeneous preferences. We investigate how tocomputationally implement a recently-proposed theoretical solutionto this problem (Budish, 2009) which uses approximate competitive equilibriato balance notions of efficiency, fairness, and incentives. Despite the apparent similarity to the well-known combinatorial auction problem we show that no polynomial-size mixed-integer program (MIP) can solve our problem. Instead,we develop a two-level search process: at the master level, the centeruses tabu search over the union of two distinct neighborhoods to suggest prices; at the agent level, we use MIPs to solve for studentdemands in parallel at the current prices. Our method scales near-optimally in the number of processors used and is able to solve realistic-size problems fast enough to be usable in practice. Finding Approximate Competitive Equilibria: Efficient and Fair Course Allocation
We investigate the problem of allocating items (private goods) amongcompeting agents in a setting that is both prior-free and payment-free. Specifically, we focus on allocating multiple heterogeneousitems between two agents with additive valuation functions. Ourobjective is to design strategy-proof mechanisms that are competitive against the most efficient (first-best) allocation. We introduce the family of linear increasing-price (LIP) mechanisms. TheLIP mechanisms are strategy-proof, prior-free, and payment-free,and they are exactly the increasing-price mechanisms satisfying astrong responsiveness property. We show how to solve for competitive mechanisms within the LIP family. For the case of two items,we find a LIP mechanism whose competitive ratio is near optimal(the achieved competitive ratio is $0.828$, while any strategy-proofmechanism is at most $0.841$-competitive). As the number of itemsgoes to infinity, we prove a negative result that any increasing-pricemechanism (linear or nonlinear) has a maximal competitive ratio of$0.5$. Our results imply that in some cases, it is possible to designgood allocation mechanisms without payments and without priors. Strategy-proof Allocation of Multiple Items between Two Agents without Payments or Priors
Minimax-regret preference elicitation allows intelligent decisions to be made on behalf of people facing risky choices.Standard gamble queries, a vital tool in this type of preference elicitation, assume that people, from whom preference information is being elicited, can be modeled using expected utility theory. However, there is strong evidence frompsychology that people may systematically deviate from expected utility theory. Cumulative prospect theory is an alternative model to expected utility theory which has beenshown empirically, to better explain humans’ decision making in risky settings. We show that the current minimaxregret preference elicitation techniques can fail to properlyelicit appropriate information if the preferences of the userfollow cumulative prospect theory. As a result, we developa new querying method for preference elicitation that is applicable to cumulative prospect theory models. Simulationsshow that our method can effectively elicit information fordecision making in both cumulative prospect theory and expected utility theory settings, resulting in a flexible and effective preference elicitation method. Preference Elicitation for Risky Prospects
The vision of the Smart Grid includes the creation of intelligent electricity supply networks to allow efficient use of energy resources, reduce carbon emissions and are robust to failures. One of the key assumptions underlying this vision is that it will be possible to manage the \emph{trading} of electricity between homes and micro-grids while coping with the inherent real-time dynamism in electricity demand and supply. The management of these trades needs to take into account the fact that most, if not all, of the actors in the system are self-interested and transmission line capacities are constrained. Against this background, we develop and evaluate a novel market-based mechanism and novel trading strategies for the Smart Grid. Our mechanism is based on the Continuous Double Auction (CDA) and automatically manages the congestion within the system by pricing the flow of electricity. We also introduce mechanisms to ensure the system can cope with unforseen demand or increased supply capacity in real time. Finally, we develop new strategies that we show achieve high market efficiency (typically over 90\%). Trading Agents for the Smart Electricity Grid
Robotic swarms, like all spatial computers, are a challenging environment for the execution of distributed consensus algorithms due to their scale, diameter, and frequent failures. Exact consensus is generally impractical on spatial computers, so we consider approximate consensus algorithms. In this paper, we show that the family of self-organizing protocols based on the graph Laplacian of a network are impractical as well. With respect to the structure of a finite-neighborhood spatial computer, we find that these protocols have an expected convergence time of $O(diameter^2)$ when the inputs are strongly correlated with location. Verifying this result in simulation, we further determine that the constant factor on the convergence time is high, rendering Laplacian-based approximate consensus unsuitable for general use on spatial computers. Laplacian-Based Consensus on Spatial Computers
Several researchers, present authors included, envision personalmobile robot agents that can assist humans in their dailytasks. Despite many advances in robotics, such mobile robot agentsstill face many limitations in their perception, cognition, and actioncapabilities. In this work, we propose a symbiotic interaction betweenrobot agents and humans to overcome the robot limitations whileallowing robots to also help humans. We introduce a visitor'scompanion robot agent, as a natural task for such symbioticinteraction, e.g., the visitor lacks knowledge of the environment butcan easily open a door or read a door label, while the mobile robotwith no arms cannot open a door and may be confused about its exactlocation, but can plan paths well through the building and can provideuseful relevant information to the visitor. We present this visitorcompanion task in detail with an enumeration and formalization of theactions of the robot agent in its interaction with the human. Webriefly describe the wifi-based robot localization algorithm and showresults of the different levels of human help to the robot during the robotnavigation. We then model the tradeoffs of the value of the robot help to the human and present illustrative experiments. Our work has been fully implemented in a mobile robot agent, CoBot, which has successfully navigated for several hours and continues to navigate in our indoor environment. An Effective Personal Mobile Robot Agent Through Symbiotic Human-Robot Interaction
Although a remarkably high degree of automation has beenreached in production and intra-logistics nowadays, humanlabor is still used for transportation using handcarts andforklifts. High labor cost and risk of injury are the undesirable consequences. Alternative approaches in automatedwarehouses are fixed installed conveyors installed either overhead or floor-based. The drawback of such solutions is thelack of flexibility, which is necessary when the productionlines of the company change. Then, such an installation hasto be re-built.In this paper, we propose a novel approach of decentralized teams of autonomous robots performing intra-logisticstasks using distributed algorithms. Centralized solutionssuffer from limited scalability and have a single point of failure. The task is to transport material between stations keeping the communication network structure intact and mostimportantly, to facilitate a fair distribution of robots amongloading stations. Our approach is motivated by strategiesfrom peer-to-peer-networks and mobile ad-hoc networks. Inparticular we use an adapted version of distributed heterogeneous hash tables (DHHT) for distributing the tasks andlocalized communication. Experimental results presented inthis paper show that our method reaches a fair distributionof robots over loading stations. Decentralized Hash Tables For Mobile Robot Teams Solving Intra-Logistics Tasks
The central problem of designing intelligent robot systems which learn by demonstrations of desired behaviour has been largely studied within the field of robotics. Numerous architectures for action recognition and prediction of intent of a single teacher have been proposed. However, little work has been done addressing how a group of robots can learn by simultaneous demonstrations of multiple teachers.This paper contributes a novel approach for learning multirobot joint action plans from unlabelled data. The robots firstly learn the demonstrated sequence of individual actions using the HAMMER architecture. Subsequently, the group behaviour is segmented over time and space by applying a spatio-temporal clustering algorithm.The experimental results, in which humans teleoperated real robots during a search and rescue task deployment, successfully demonstrated the efficacy of combining action recognition at individual level with group behaviour segmentation, spotting the exact moment when robots must form coalitions to achieve the goal, thus yielding reasonable generation of multirobot joint action plans. Learning Multirobot Joint Action Plans from Simultaneous Task Execution Demonstrations
We consider a heterogeneous swarm consisting of aerial andwheeled robots. We present a system that enables spatiallytargeted communication. Our system enables aerial robots toestablish dedicated communication links with individual wheeledrobots or with selected groups of wheeled robots based on theirposition in the environment. The system does not rely on any form of globalinformation. We show how a spatially targetedone-to-one communication link can be established using a simple LEDand camera based communication modality. We provide a probabilisticmodel of our approach to derive an upper bound on the average timerequired for establishing communication. In simulation, we show thatour approach scales well. Furthermore, we show how our approach can beextended to establish a spatially targeted one-to-many communicationlink between an aerial robot and a specific number of co-locatedwheeled robots. The heterogeneous swarm robotic hardware is currentlyunder development. We therefore demonstrate the proposed approach onan existing multirobot system consisting of only wheeled robots byletting one of the wheeled robots assume the role of an aerial robot. Establishing Spatially Targeted Communication in a Heterogeneous Robot Swarm
%Might want to check this one is correctWe describe a formalism and algorithms for game-tree search inpartially-observable Euclidean space, and implementation and testsin a scenario where a multi-agent team, called tracking agents, pursues a target agent that wants to evade the tracking agents. Ourcontributions include — A formalism that combines geometric elements (agents’ locations and trajectories and observable regions, and obstaclesthat restrict mobility and observability) with game-theoretic elements (information sets, utility functions, and strategies);A recursive formula for information-set minimax values basedon our formalism, and a implementation of the formula in agame-tree search algorithm;A heuristic evaluation function for use at the leaf nodes of thegame-tree search. It works by doing a quick lookahead searchof its own, in a relaxed version of the problem;Experimental results in 500 randomly generated trials. With thestrategies generated by our heuristic, the tracking agents weremore than twice as likely to know the target agent’s location atthe end of the game than with the strategies generated by heuristics that compute estimates of the target’s possible locations. Strategy Generation in Multi-Agent Imperfect-Information Pursuit Scenarios
To adequately deal with the unpredictable and dynamic environments normative frameworks are typically deployed in, mechanisms for modifying the norms at runtime are crucial. We present the syntax and operational semantics of generic programming constructs to facilitate runtime norm modification, allowing a programmer to specify when and how the norms may be changed by external agents or by the normative framework. The norms take on the form of conditional obligations and prohibitions, which instantiate detached obligations and prohibitions (instances). We present rule-based constructs for runtime modification of the norms and their instances, and a mechanism for automatically updating the instances when their underlying norms change. Programming Norm Change
This paper proposes a combined mechanism for coordinating agents in timed normative multi-agent systems. Timing constraints in a multi-agent system make it possible to force action execution to happen before certain time invariants are violated. In such multi-agent systems we achieve coordination at two orthogonal levels with respect to states and actions. On the one hand, the behaviour of individual agents is regulated by means of social and organisational inspired concepts like norms and sanctions. On the other hand, the behaviour of sets of agents is restricted according to action-based coordination mechanisms called choreographies. In both cases, the resulting behaviour is constrained by time. Strategic Executions of Choreographed Timed Normative Multi-Agent Systems
The execution of an artificial agent is usually implemented with a sense—reason—act cycle. This cycle includes tasks such as event processing, generating and revising plans, and selecting actions to execute. However, there are typically many choices in the design of such a cycle, which are often hard-coded in the cycle in an ad hoc way. The question of this paper is how one decides, in a principled way, how often and which reasoning rules to apply, how to interleave the execution of plans, or when to start replanning. This paper proposes and formalizes the eliciting conditions of hope, fear, joy, and distress according to a well-known psychological model of human emotion. These conditions are then used to reduce the choices an agent can make in each state. They formalize the idea that emotions focus an agent's attention on what is important in each state. Emotions to Control Agent Deliberation
We consider the problem of allocating networked resources in dynamic environment, such as cloud computing platforms, where providers strategically price resources to maximize their utility. Resource allocation in these environments, where both providers and consumers are selfish agents, presents numerous challenges since the number of consumers and their resource demand is highly dynamic. While numerous auction-based approaches have been proposed in the literature, this paper explores an alternative approach where providers and consumers automatically negotiate resource leasing contracts. Since resource demand and supply can be dynamic and uncertain, we propose a distributed negotiation mechanism where agents negotiate over both a contract price and a decommitment penalty, which allows agents to decommit from contracts at a cost. We compare our approach experimentally, using representative scenarios and workloads, to both combinatorial auctions and the fixed-price model used by Amazon's Elastic Compute Cloud, and show that the negotiation model achieves a higher social welfare. Automated Negotiation with Decommitment for Dynamic Resource Allocation in Cloud Computing
The primary target of this work is human-robot collaboration, especially for service robots in complicated applicationscenarios. Three assumptions and four requirements areidentified. State-of-the-art, general-purpose Natural Language Processing (NLP), Commonsense Reasoning (in particular, ASP), and Robotics techniques are integrated in alayered architecture. The architecture and mechanisms havebeen implemented on a service robot, Ke Jia. Instead ofcommand languages, small limited segments of natural languages are employed in spoken dialog between Ke Jia and itsusers. The information in the dialog is extracted, classifiedand transferred into inner representation by Ke Jia's NLPmechanism, and further used autonomously in problem-solvingand planning. A series of case study was conducted onKe Jia with positive results, verifying its ability of acquiringknowledge through spoken dialog with users, autonomoussolving problems by virtue of acquired causal knowledge,and autonomous planning for complex tasks. Developing High-level Cognitive Functions for Service Robots
Verification of multi-agent programs is a key problem in agentresearch and development. One may for example be interested inchecking if a specific subset of individual agent programs canachieve a particular state of their shared environment, or if theycan protect the system from entering a "bad" state. This paperfocuses on BDI-based multi-agent programs that consist of a finiteset of BDI-based agent programs executed concurrently. We choosealternating-time temporal logic (ATL) for expressing suchproperties. However, the original ATL is based on a synchronousmodel of multi-agent computation while most (if not all) multi-agentprogramming frameworks use asynchronous semantics where activitiesof different agents are interleaved. Moreover, unlike in ATL, ouragent programs do not have perfect information about the currentglobal state of the system. They are not appropriate subjects ofmodal epistemic logic either (since they do not know the globalmodel of the system). We begin by adapting the semantics of ATL tothe situation at hand; then, we consider the verification problem inthe new setting and present some preliminary results. Reasoning about Strategies of Multi-Agent Programs
A major research challenge in multi-agent systems is theproblem of partitioning a set of agents into mutually disjoint coalitions, such that the overall performance of thesystem is optimized. This problem is difficult because thesearch space is very large: the number of possible coalition structures increases exponentially with the number ofagents. Although several algorithms have been proposed totackle this Coalition Structure Generation (CSG) problem,all of them suffer from being inherently centralized, whichleads to the existence of a performance bottleneck and a single point of failure. In this paper, we develop the first decentralized algorithm for solving the CSG problem optimally.In our algorithm, the necessary calculations are distributedamong the agents, instead of being carried out centrally bya single agent (as is the case in all the available algorithmsin the literature). In this way, the search can be carriedout in a much faster and more robust way, and the agentscan share the burden of the calculations. The algorithmcombines, and improves upon, techniques from two existingalgorithms in the literature, namely DCVC and IP,and applies novel techniques for filtering the input and reducing the inter-agent communication load. A Distributed Algorithm for Anytime Coalition Structure Generation
Distributed Constraints Optimization (DCOP) is a powerful framework for representing and solving distributed combinatorial problems, where the variables of the problem are owned by different agents. DCOP algorithms search for the optimal solution, optimizing the total gain (or cost) that iscomposed of all gains of all agents. Local search (LS) DCOP algorithms search locally for an approximate such solution. Many multi-agent problems include constraints that produce different gains (or costs) for the participating agents. Asymmetric gains of constrained agents cannot be naturally represented by the standard DCOP model. The present paper proposes a general framework for Asymmetric DCOPs (ADCOPs). The new framework is described and its differences from former attempts are discussed. New local search algorithms for ADCOPs are introduced and their advantages over existing algorithms and over formerrepresentations are discussed in detail. The new proposed algorithms for the ADCOP framework are evaluated experimentally and their performance compared to existing algorithms. Two measures of performance are used: quality of solutions and loss of privacy. The results show that the new algorithms significantly outperform existing DCOP algorithms with respect to both measures. Local search for Distributed Asymmetric Optimization
In this paper, we propose a Quantified Distributed Constraint Optimization problem (QDCOP) that extends the framework of Distributed Constraint Optimization problems (DCOPs). DCOPs havebeen studied as a fundamental model of multi-agent cooperation.In traditional DCOPs, all agents cooperate to optimize the sum oftheir cost functions. However, in practical systems some agentsmay desire to select the value of their variables without cooperation. In special cases, such agents may take the values with theworst impact on the quality of the result reachable by the optimization process. We apply existential/universal quantifiers to distinctuncooperative variables. A universally quantified variable is leftunassigned by the optimization as the result has to hold when ittakes any value from its domain, while an existentially quantifiedvariable takes exactly one of its values for each context. Similar classes of problems have recently been studied as (Distributed)Quantified Constraint Problems, where the variables of the CSPhave quantifiers. All constraints should be satisfied independentlyof the value taken by universal variables. We propose a QDCOPthat applies the concept of game tree search to DCOP. If the original problem is a minimization problem, agents that own universallyquantified variables may intend to maximize the cost value in theworst case. Other agents normally intend to optimize the minimizing problems. Therefore, only the bounds, especially the upperbounds, of the optimal value are guaranteed. The purpose of thenew class of problems is to compute such bounds, as well as tocompute sub-optimal solutions. For the QDCOP, we also proposeseveral methods that are based on min-max/alpha-beta and ADOPTalgorithms. A Quantified Distributed Constraint Optimization Problem
Recent studies have investigated how a team of mobile sensors can cope withreal world constraints, such as uncertainty in the reward functions,dynamically appearing and disappearing targets, technology failures endchanges in the environment conditions. In this study we consider an additional element, deception by an adversary,which is relevant in many (military) applications. The adversary isexpected to use deception to prevent the sensor team from performing itstasks. We employ a game theoretic model to analyze the expected strategy ofthe adversary and find the best response. More specifically we considerthat the adversary deceptively changes the importance that agents give totargets in the area. The opponent is expected to use camouflage in order to create confusionamong the sensors regarding the importance of targets, and reduce the team'sefficiency in target coverage. We represent a Mobile Sensor Team problem using the Distributed ConstraintOptimization Problem (DCOP) framework. We propose an optimal method for theselection of a position of a single agent facing a deceptive adversary. Thismethod serves as a heuristic for agents to select their position in a fullscale problem with multiple agents in a large area. Our empirical study demonstrates the success of our model as compared withexisting models in the presence of deceptions. Deception in Networks of Mobile Sensing Agents
Many applications in multiagent learning are essentially convex optimization problems in which agents have only limited communication and partial information about the function being minimized (examples of such applications include, among others, coordinated sensor localization, distributed adaptive filtering, control, and coordination). Given this observation, we propose a new non-hierarchical decentralized algorithm for the asymptotic minimization of possibly time-varying convex functions. In our method each agent has knowledge of a time-varying local cost function, and the objective is to minimize asymptotically a global cost function defined by the sum of the local functions. At each iteration of our algorithm, agents improve their estimates of a minimizer of the global function by applying a particular version of the adaptive projected subgradient method to their local functions. Then the agents exchange and mix their improved estimates using a probabilistic model based on recent results in weighted average consensus algorithms. The resulting algorithm is provably optimal and reproduces as particular cases many existing algorithms (such as consensus algorithms and recent methods based on the adaptive projected subgradient method). To illustrate one possible application, we show how our algorithm can be applied to coordinated acoustic source localization in sensor networks. Distributed Multiagent Learning with a Broadcast Adaptive Subgradient Method
Coordination of agent activites in non-deterministic, distributed environments is computationally difficult. Distributed Constraint Optimization (DCOP) provides a rich framework for modeling such multi-agent coordination problems, but existing representations, problem domains, and techniques for DCOP focus on small ($<$100 variables), deterministic solutions. We present a novel approach to DCOP for large-scale applications that contain uncertain outcomes. These types of real-time domains require distributed, scalable algorithms to meet difficult bounds on computation and communication time. To achieve this goal, we develop a new distributed neighbor exchange algorithm for DCOPs that scales to problems involving hundreds of variables and constraints and offers faster convergence to high quality solutions than existing DCOP algorithms. In addition, our complete solution includes new techniques for dynamic distributed constraint optimization and uncertainty in constraint processing. We validate our approach using test scenarios from the DARPA Coordinators program and show that our solution is very competitive with existing approaches. Coordination for Uncertain Outcomes using Distributed Neighbor Exchange
Alternating-time Temporal Logic (ATL) is widely used to reason about strategic abilities of players. Aiming at strategies that can realistically be implemented in software, many variants of ATL study a setting with incomplete information, where strategies may only take available information into account. Another generalization of ATL is Probabilistic ATL, where strategies are required to achieve their goal with a certain probability. We introduce a semantics of ATL that takes into account both of these aspects. We prove that our semantics allows simulation relations similar in spirit to usual bisimulations, and has a decidable model checking problem (in the case of memoryless strategies, with memory-dependent strategies the problem is undecidable). Strategic Planning for Probabilistic Games with Incomplete Information
Agents often have to solve series of similar search problems.Adaptive A* is a recent incremental heuristic search algorithm that solves such problems faster than A*, updating the h-values using information from previous searches.In this paper, we address multi-target search problems on Adaptive A* frameworks.Although we can solve such problems by calculating the optimal path to each target, it would be inefficient, especially when the number of targets is large.We consider two cases whose goals are (1) an agent reaches one of the targets and (2) an agent has to reach all of the targets.We propose a method to solve such problems keeping consistency of the h-values.Our experiments demonstrate that the proposed method is efficient than the straightforward method. Multi-Target Adaptive A*
Planning how to interact against bounded memory and unbounded memory learning opponents needs different treatment. Thus far, however, work in this area has shown how to design plans against bounded memory learning opponents, but no work has dealt with the unbounded memory case. This paper tackles this gap. In particular, we frame this as a planning problem using the framework of repeated matrix games, where the planner's objective is to compute the best exploiting sequence of actions against a learning opponent. The particular class of opponent we study uses a fictitious play process to update her beliefs, but the analysis generalizes to many forms of Bayesian learning agents. Our analysis is inspired by Banerjee and Peng's AIM framework, which works for planning and learning against bounded memory opponents (e.g an adaptive player). Building on this, we show how an unbounded memory opponent (specifically a fictitious player) can also be modelled as a finite MDP and present a new efficient algorithm that can find a way to exploit the opponent by computing in polynomial time a sequence of play that can obtain a higher average reward than those obtained by playing a game theoretic (Nash or correlated) equilibrium. Planning Against Fictitious Players in Repeated Normal Form Games
Moving target search is important for robotics applicationswhere unmanned ground vehicles (UGVs) have to followother friendly or hostile UGVs. Artificial intelligence researchers have recently used incremental search to speedup the computation of a simple strategy for the hunter.The fastest incremental search algorithm, Fringe-RetrievingA*, solves moving target search problems only on two-dimensional grids, which are rather unrealistic models forrobotics applications. We therefore generalize it to Generalized Fringe-Retrieving A*, which solves moving target searchproblems on arbitrary graphs, including the state latticesused for UGV navigation. Generalized Fringe-Retrieving A*: Faster Moving Target Search on State Lattices
In many AI settings an agent is comprised of both action-planning and action-execution components. We examine the relationship between the precision of the execution component, the intelligence of the planning component, and the overall success of the agent. Our motivation lies in determining whether higher execution skill rewards more strategic playing. We present a computational billiards framework in which the interaction between skill and strategy can be experimentally investigated. By comparing the performance of different agents with varying levels of skill and strategic intelligence we show that intelligent planning can contribute most to an agent's success when that agent has imperfect skill. Success, strategy and skill: an experimental study
In this paper we study, for the first time explicitly, the implications of endowing an interested party (i.e. a teacher)with the ability to modify the underlying dynamics of theenvironment, in order to encourage an agent to learn to follow a specific policy. We introduce a cost function whichcan be used by the teacher to balance the modifications itmakes to the underlying environment dynamics, with thelearner’s performance compared to some ideal, desired, policy. We formulate teacher’s problem of determining optimalenvironment changes as a planning and control problem, andempirically validate the effectiveness of our model. Cultivating Desired Behavior: Policy Teaching Via Environment-Dynamics Tweaks
Cooperative games provide an appropriate framework forfair and stable resource allocation in multiagent systems.This paper focusses on monotone cooperative games, a classwhich comprises a variety of games that have enjoyed specialattention within AI, in particular, skill games, connectivity games, flow games, voting games, and matching games.Given a threshold, each monotone cooperative game naturally corresponds to a simple game. The core of a thresholdversion may be empty, even if that is not the case in themonotonic game itself. For each of the subclasses of monotonic games mentioned above, we conduct a computationalanalysis of problems concerning some relaxations of the coresuch as the least-core and the cost of stability. It is shownthat threshold versions of monotonic games are generallyat least as hard to handle computationally. We also introduce the length of a simple game as the size of the smallestwinning coalition and study its computational complexityin various classes of simple games and its relationship withcomputing core-based solutions. A number of computationalhardness results are contrasted with polynomial time algorithms to compute the length of threshold matching gamesand the cost of stability of matching games, spanning connectivity games, and simple coalitional skill games with aconstant number of skills. Monotone cooperative games and their threshold versions
Bayesian games can be used to model single-shot decisionproblems in which agents only possess incomplete information about other agents, and hence are important for multiagent coordination under uncertainty. Moreover they canbe used to represent different stages of sequential multiagent decision problems, such as POSGs and DEC-POMDPs,and appear as an operation in many methods for multiagentplanning under uncertainty. In this paper we are interestedin coordinating teams of cooperative agents. While manysuch problems can be formulated as Bayesian games withidentical payoffs, little work has been done to improve solution methods. To help address this situation, we provide abranch and bound algorithm that optimally solves identicalpayoff Bayesian games. Our results show a marked improvement over previous methods, obtaining speedups of up to 3orders of magnitude for synthetic random games, and reaching 10 orders of magnitude speedups for games in a DEC-POMDP context. This not only allows Bayesian games tobe solved more efficiently, but can also improve multiagentplanning techniques such as top-down and bottom-up algorithms for decentralized POMDPs. Heuristic Search for Identical Payoff Bayesian Games
We propose Path Disruption Games (PDGs), which considercollaboration between agents attempting stop an adversaryfrom travelling from a source node to a target node in agraph. PDGs can model physical or network security domains. The coalition attempts to stop the adversary byplacing checkpoints in intermediate nodes in the graph, tomake sure the adversary cannot travel through them. Thus,the coalition wins if it controls a node subset whose removalfrom the graph disconnects the source and target. We analyze this domain from a cooperative game theoretic perspective, and consider how the gains can be distributed betweenthe agents controlling the vertices. We also consider powerindices, which express the influence of each checkpoint location on the outcome of the game, and can be used to identify the most critical locations where checkpoints should beplaced. We consider both general graphs and the restrictedcase of trees, and consider both a model with no cost forplacing a checkpoint and a model with where each vertexhas its own cost for placing a checkpoint. Path Disruption Games
Empirical analyses of complex games necessarily focus ona restricted set of strategies, and thus the value of empirical game models depends on effective methods for selectively exploring a space of strategies. We formulate an iterative framework for strategy exploration, and experimentallyevaluate an array of generic exploration policies on threegames: one infinite game with known analytic solution, andtwo relatively large empirical games generated by simulation. Policies based on iteratively finding a beneficial deviation or best response to the minimum-regret profile amongpreviously explored strategies perform generally well on theprofile-regret measure, although we find that some stochastic introduction of suboptimal responses can often lead tomore effective exploration in early stages of the process. Anovel formation-based policy performs well on all measuresby producing low-regret approximate formations earlier thanthe deviation-based policies. Strategy Exploration in Empirical Games
There has been significant recent interest in game theoretic approaches to security, with much of the recent research focused onutilizing the leader-follower Stackelberg game model; for example, these games are at the heart of major applications such asthe ARMOR program deployed for security at the LAX airportsince 2007 and the IRIS program in use by the US Federal AirMarshals (FAMS). The foundational assumption for using Stackelberg games is that security forces (leaders), acting first, commit toa randomized strategy; while their adversaries (followers) choosetheir best response after surveillance of this randomized strategy.Yet, in many situations, the followers may act without observation of the leader’s strategy, essentially converting the game intoa simultaneous-move game model. Previous work fails to addresshow a leader should compute her strategy given this fundamentaluncertainty about the type of game faced.Focusing on the complex games that are directly inspired by realworld security applications, the paper provides four contributionsin the context of a general class of security games. First, exploiting the structure of these security games, the paper shows that theNash equilibria in security games are interchangeable, thus alleviating the equilibrium selection problem. Second, resolving theleader’s dilemma, it shows that under a natural restriction on security games, any Stackelberg strategy is also a Nash equilibriumstrategy; and furthermore, the solution is unique in a class of realworld security games of which ARMOR is a key exemplar. Third,when faced with a follower that can attack multiple targets, manyof these properties no longer hold. Fourth, our experimental resultsemphasize positive properties of games that do not fit our restrictions. Our contributions have major implications for the real-worldapplications. Stackelberg vs. Nash in Security Games: Interchangeability, Equivalence, and Uniqueness
In the multi-agent systems community, dependence theory and game theory are often presented as two alternative perspectives on the analysis of social interaction. Up till now no research has been done relating these two approaches. The unification presented provides dependence theory with the sort of mathematical foundations which still lacks, and shows how game theory can incorporate dependence-theoretic considerations in a fully formal manner. Dependence Theory via Game Theory
In this paper, we present a mechanism to fly a swarm of UAVs with the aim of establishing a wireless backbone over a pre-specified area. The backbone is aimed at connecting intermittent wireless signal emitting mobile ground stations (GSs), comprising rescue teams and survivors. To this end, we present a decentralized behavior-based cooperative control architecture to search for unknown GSs and relay packets from one GS to another. A delay tolerant network protocol implementation is assumed on the agents but maintained transparent to the GSs. The conditions used to determine when an agent switches between the different states of operation are adapted to maximize a measured performance score. The belief exchange mechanism for cooperation is designed to utilize low bandwidth through state estimation by a Dynamic Cell Structure. Extensive simulations are performed to measure performance metrics like latency and visit frequency with different number of agents on field. UAV Swarm Coordination using Cooperative Control for establishing a Wireless Communications Backbone
In large wireless sensor networks, the problem of assigning radiofrequencies to sensing agents such that no two connected sensorsare assigned the same value (and will thus interfere with one another) is a major challenge. To tackle this problem, we developa novel decentralised coordination algorithm that activates only asubset of the deployed agents, subject to the connectivity graphof this subset being provably 3-colourable in linear time, henceallowing the use of a simple decentralised graph colouring algorithm. Crucially, while doing this, our algorithm maximises thesensing coverage achieved by the selected sensing agents, whichis given by an arbitrary non-decreasing submodular set function.We empirically evaluate our algorithm by benchmarking it againsta centralised greedy algorithm and an optimal one, and show thatthe selected sensing agents manage to achieve 90\% of the coverage provided by the optimal algorithm, and 85\% of the coverageprovided by activating all sensors. Moreover, we use a simple decentralised graph colouring algorithm to show the frequency assignment problem is easy in the resulting graphs; in all consideredproblem instances, this algorithm managed to find a colouring inless than 5 iterations on average. We then show how the algorithmcan be used in dynamic settings, in which sensors can fail or newsensors can be deployed. In this setting, our algorithm provides250\% more coverage over time compared to activating all availablesensors simultaneously. A Decentralised Coordination Algorithm for Minimising Conflict and Maximising Coverage in Sensor Networks
Generally, swarm models and algorithms consider synchronous agents, i.e., they act simultaneously. This hypothesisdoes not fit multi-agent simulators nor robotic systems. Inthis paper, we consider such issues on a patrolling ant algorithm. We examine how different execution hypothesesinfluence self-organization capabilities and patrolling performances of this algorithm. We consider the mono and multiagent cases, the synchronism and determinism hypotheses,and the execution of the model with real robots. Influence of Different Execution Models on Patrolling Ants Behaviors: from Agents to Robots
The coordination of emergency responders and robots to undertake a number of tasks in disaster scenarios is a grand challenge for multi-agent systems. Central to this endeavour is the problem of forming the best teams (coalitions) of responders to perform the various tasks in the area where the disaster has struck. Moreover, these teams may have to form, disband, and reform in different areas of the disaster region. This is because in most cases there will be more tasks than agents. Hence, agents need to schedule themselves to attempt each task in turn. Second, the tasks themselves can be very complex: requiring the agents to work on them for different lengths of time and having deadlines by when they need to be completed. The problem is complicated still further when different coalitions perform tasks with different levels of efficiency. Given all these facets, we define this as The Coalition Formation with Spatial and Temporal constraints problem (CFSTP). We show that this problem is NP-hard—-in particular, it contains the well-known complex combinatorial problem of Team Orienteering as a special case. Based on this, we design a Mixed Integer Program to optimally solve small-scale instances of the CFSTP and develop new anytime heuristics that can, on average, complete 97\% of the tasks for large problems (20 agents and 300 tasks). In so doing, our solutions represent the first results for CFSTP. Coalition Formation with Spatial and Temporal Constraints
Coordination within decentralized agent groups frequently requiresreaching global consensus, but typical hierarchical approaches toreaching such decisions can be complex, slow, and not fault-tolerant.By contrast, recent studies have shown that in decentralized animalgroups, a few individuals without privileged roles can guide theentire group to collective consensus on matters like travel direction. Inspired by these findings, we propose an implicit leadershipalgorithm for distributed multi-agent systems, which we prove reliably allows all agents to agree on a decision that can be determinedby one or a few better-informed agents, through purely local sensing and interaction. The approach generalizes work on distributedconsensus to cases where agents have different confidence levelsin their preferred states. We present cases where informed agentsshare a common goal or have conflicting goals, and show how thenumber of informed agents and their confidence levels affects theconsensus process. We further present an extension that allows forfast decision-making in a rapidly changing environment. Finally,we show how the framework can be applied to a diverse variety ofapplications, including mobile robot exploration, sensor networkclock synchronization, and shape formation in modular robots. Collective Decision-Making in Multi-Agent Systems by Implicit Leadership
A classic example of multiagent coordination in a shared environment involves the use of pheromone deposits as a communication mechanism. Due to physical limitations in deploying actual pheromones, we propose a sparse representation of the pheromones using movable beacons. There is no communication between the beacons to propagate pheromones; instead, robots make movement and update decisions based entirely on local pheromone values. Robots deploy the beacons throughout the environment, and subsequently move them and update them using a variation of value iteration. Simulation results show that our approach is effective at finding good trails, locally improving them, and adapting to dynamic changes in the environments. Collaborative Foraging using Beacons
For an agent to intelligently use specifications of executableprotocols, it is necessary that the agent can quickly and correctly assess the outcomes of that protocol if it is executed.In some cases, this information may be attached to the specification; however, this is not always the case. In this paper, we present an algorithm for deriving characterisations ofprotocols. These characterisations specify the preconditionsunder which the protocol can be executed, and the outcomesof this execution. The algorithm is applicable to definitionswith infinite iteration, and recursive definitions that terminate. We prove how a restricted subset of non-terminatingrecursive protocols can be characterised by rewriting theminto equivalent non-recursive definitions before characterisation. We then define a method for matching protocolsfrom their characterisations. We prove that the complexityof the matching method is less than for methods such as adepth-first search algorithm. Our experimental evaluationconfirms this. Characterising and Matching Iterative and Recursive Agent Interaction Protocols
A dynamic model of a multiagent system defines a probability distribution over possible system behaviors over time.Alternative representations for such models present tradeoffs in expressive power, and accuracy and cost for inferential tasks of interest. In a history-dependent representation,behavior at a given time is specified as a probabilistic function of some portion of system history. Models may be further distinguished based on whether they specify individualor joint behavior. Joint behavior models are more expressive, but in general grow exponentially in number of agents.Graphical multiagent models (GMMs) provide a more compact representation of joint behavior, when agent interactions exhibit some local structure. We extend GMMs tocondition on history, thus supporting inference about system dynamics. To evaluate this hGMM representation westudy a voting consensus scenario, where agents on a network attempt to reach a preferred unanimous vote through aprocess of smooth fictitious play. We induce hGMMs and individual behavior models from example traces, showing thatthe former provide better predictions, given limited historyinformation. These hGMMs also provide advantages for answering general inference queries compared to sampling thetrue generative model. History-Dependent Graphical Multiagent Models
Recursive reasoning of the form {\em what do I think that you think that I think} (and so on) arises often while acting rationally in multiagent settings. Several multiagent decision-making frameworks such as RMM, I-POMDP and the theory of mind model recursive reasoning as integral to an agent's rational choice. Real-world application settings for multiagent decision making are often mixed involving humans and human-controlled agents. In two large experiments, we studied the level of recursive reasoning generally displayed by humans while playing sequential general-sum and fixed-sum, two-player games. Our results show that subjects experiencing a general-sum strategic game display first or second level of recursive thinking with the first level being more prominent. However, if the game is made simpler and more competitive with fixed-sum payoffs, subjects predominantly attributed first-level recursive thinking to opponents thereby acting using second level of reasoning. Subsequently, we model the behavioral data obtained from the studies using the I-POMDP framework, appropriately augmented using well-known human judgment and decision models. Accuracy of the predictions by our models suggest that these could be viable ways for computationally modeling strategic behavioral data. Modeling Recursive Reasoning by Humans Using Empirically Informed Interactive POMDPs
Despite the recent advances in planning with MDPs, the problem ofgenerating good policies is still hard. In this paper, we describe a planning novel approach for generating policies in MDPs that (1) determinizes the given MDP model into a classicalplanning problem; (2) generates partial policies off-line byproducing solution plans to the classical planning problem and incrementallyaggregating them into a policy, and (3) uses sequential Monte-Carlo (MC)simulations of the partial policies before execution, in order to assess the probability of replanning after execution. The objective of this approach is to quickly generate policies in which the probability of replanning is low and below a given threshold. %paraWe describe our planner {\tt RFF}, which incorporates the above ideas. We present an extensive experimental study with {\tt RFF}: %para1) RFF was the winner of the fully-observable probabilistic track in the 2008International Planning Competition (IPC-08). %para2) In addition to our study of the IPC-08 results, we analyzed {\tt RFF}'sperformance with different plan aggregation and determinization strategies, with varying amount of MC sampling, andwith varying threshold values for probability of execution failures. The results of these experiments revealed how they impact the time performance of {\tt RFF} togenerate solution policies and the quality of those solution policies(i.e., the average cumulated reward gathered from the execution of the policies). Incremental Plan Aggregation for Generating Policies in MDPs
We propose an integrated theoretical framework, grounded in possibility theory, to account for all the aspects involved in representing and changing beliefs, representing and generating justified desires, and selecting goals based on current and uncertain beliefsabout the world, and the preferences of the agent.Beliefs and desires of a cognitive agent are represented as (twodistinct) possibility distributions. This is the original part of theproposed framework in the sense that there does not really exist afull-fledged possibilistic approach to BDI. In the proposed framework: (i) the possibility distribution representing the qualitativeutilities associated with desires are the result of a rule-based deliberative process, that is, they depend on the mental state of the agentand are not given a priori; and (ii) the criteria for choosing goalstake into account not only the fact that goals must be consistent inthe logical sense, but also in the cognitive sense. An Integrated Possibilistic Framework for Goal Generation in Cognitive Agents
This paper aims at giving a classification of argumentationgames agents play within a multi-agent setting. We investigate different scenarios of such argumentation games thatdiffer in the protocol used for argumentation, i.e. direct,synchronous, and dialectical argumentation protocols, theawareness that agents have on other agents beliefs, and different settings for the preferences of agents. To this endwe employ structured argumentation frameworks, which arean extension to Dung’s abstract argumentation frameworksthat give a simple inner structure to arguments. We alsoprovide some game theoretical results that characterize aspecific argumentation game as strategy-proof and developsome argumentation selection strategies that turn out tobe the dominant strategies for other specific argumentationgames. Classification and Strategical Issues of Argumentation Games on Structured Argumentation Frameworks
Virtual human research has often modeled nonverbal behaviors based on the findings of psychological research. Inrecent years, however, there have been growing efforts touse automated, data-driven approaches to find patterns ofnonverbal behaviors in video corpora and even thereby discover new factors that have not been previously documented.However, there have been few studies that compare how thebehaviors generated by different approaches are interpretedby people. In this paper, we present an evaluation study tocompare the perception of nonverbal behaviors, more specifically head nods, generated by different approaches. Studieshave shown that head nods serve a variety of communicative functions and that the head is in constant motion duringspeaking turns. To evaluate the different approaches of headnod generation, we asked human subjects to evaluate videosof a virtual agent displaying nods generated by a human,by a machine learning data-driven approach, or by a handcrafted rule-based approach. Results show that there is asignificant effect on the perception of head nods in termsof appropriate nod occurrence, especially between the data-driven approach and the rule-based approach. Evaluating Models of Speaker Head Nods for Virtual Agents
Virtual humans are embodied software agents that should not onlybe realistic looking but also have natural and realistic behaviors.Traditional virtual human systems learn these interactionbehaviors by observing how individuals respond in face-to-facesituations (i.e., direct interaction). In contrast, this paperintroduces a novel methodological approach called parasocialconsensus sampling (PCS) which allows multiple individuals tovicariously experience the same situation to gain insight on thetypical (i.e., consensus view) of human responses in socialinteraction. This approach can help tease apart what isidiosyncratic from what is essential and help reveal the strength ofcues that elicit social responses. Our PCS approach has severaladvantages over traditional methods: (1) it integrates data frommultiple independent listeners interacting with the same speaker,(2) it associates probability of how likely feedback will be givenover time, (3) it can be used as a prior to analyze and understandthe face-to-face interaction data, (4) it facilitates much quickerand cheaper data collection. In this paper, we apply our PCSapproach to learn a predictive model of listener backchannelfeedback. Our experiments demonstrate that a virtual humandriven by our PCS approach creates significantly more rapportand is perceived as more believable than the virtual human drivenby face-to-face interaction data. Parasocial Consensus Sampling: Combining Multiple Perspectives to Learn Virtual Human Behavior
In the Trading Agent Competition Ad Auctions Game, agentscompete to sell products by bidding to have their ads shownin a search engine’s sponsored search results. We report onthe winning agent from the first (2009) competition, TacTex.TacTex operates by estimating the full game state from limited information, using these estimates to make predictions,and then optimizing its actions (daily bids, ads, and spending limits) with respect to these predictions. We present afull description of TacTex along with analysis of its performance in both the competition and controlled experiments. TacTex09: A Champion Bidding Agent for Ad Auctions
This paper presents a quiz game agent who is attentive to the dynamics of multiple concurrent participants. The attentiveness ofthis agent is meant to be achieved by an utterance policy that determines the nature of the utterance and whether, when, and to whomto utter. Two heuristics are introduced to drive the policy: the interaction atmosphere (AT) of the participants and the participant whotends to lead the conversation (CLP) at a specific time point. Theyare estimated from the activeness of the participants’ face movements and acoustic information during their discussion of the answer. In order to the inherent drawback of a 2D agent that makes itdifficult for multiple concurrent users to distinguish the focus of itsattention, a physical pointer is also introduced. This system is thenevaluated using questionnaire investigation and video data analysis. The joint results of the experiments indicated that the methodsfor estimating AT and CLP worked. The participants pay more attention to the agent and participate in the game more actively if theindication of the pointer is more comprehensive. How Multiple Current Users React to a Quiz Agent Attentive to the Dynamics of Their Game Participation
Interactive narrative allows the user to play a role in a storyand interact with other characters controlled by the system.Directorial control is a procedure for dynamically tuningthe interaction towards the author’s desired effects. Mostexisting approaches for directorial control are built withinplot-centric frameworks for interactive narrative and do nothave a systematic way to ensure that the characters are always well-motivated during the interaction. Thespian is acharacter-centric framework for interactive narrative. In ourprevious work on Thespian, we presented an approach forapplying directorial control while not affecting the consistency of characters’ motivations. This work evaluates theeffectiveness of our directorial control approach. Given thepriority of generating only well-motivated characters’ behaviors, we empirically evaluate how often the author’s desiredeffects are achieved. We also discuss how the directorial control procedure can save the author effort in configuring thecharacters. Evaluating Directorial Control in a Character-Centric Interactive Narrative Framework
Virtual Actors are at the heart of Interactive Storytellingsystems and in recent years multiple approaches have beendescribed to specify their autonomous behaviour. One wellknown problem is how to achieve a balance between thecharacters’ autonomy, defined in terms of their individualroles and motivations, and the global structure of the plot,which tends to emphasise narrative phenomena and the coordination of multiple characters. In this paper we report anew approach to the definition of virtual characters aimed atachieving a balance between character autonomy and globalplot structure. Where previous approaches have tended tofocus on individual actions our objective is to reincorporatehigher-level narrative elements in the behaviour of individual actors and address the relation between character andplot at the level of behaviour representation. To this endwe introduce the notion of a characters’ Point of View andshow how it enables a story to be described from the perspective of a number of different characters: it is not merelya presentation effect it is also a different way to tell a story.As an illustration, we have developed an Interactive Narrative based on Shakespeare’s Merchant of Venice. The system, which features a novel planning approach to story generation, can generate very different stories depending on thePoint of View adopted and support dynamic modification ofthe story world which results in different story consequences.In the paper, we illustrate this approach using example narratives generated using our fully implemented prototype. Narrative Generation through Characters' Point of View
Memory-bounded techniques have shown great promise in solving complex multi-agent planning problems modeled as DEC-POMDPs. Much of the performance gains can be attributed to pruning techniques that alleviate the complexity of the exhaustive backup step of the original MBDP algorithm. Despite these improvements, state-of-the-art algorithms can still handle a relative small pool of candidate policies, which limits the quality of the solution in some benchmark problems. We present a new algorithm, Point-Based Policy Generation, which avoids altogether searching the entire joint policy space. The key observation is that the best policy for each reachable belief state can be constructed directly, instead of producing first a large set of candidates. We also provide an efficient approximate implementation of this operation. The experimental results show that our solution technique improvesthe performance significantly in terms of both runtime and solution quality. Point-Based Policy Generation for Decentralized POMDPs
Decentralized POMDPs provide an expressive frameworkfor sequential multi-agent decision making. Despite theirhigh complexity, there has been significant progress in scaling up existing algorithms, largely due to the use of point-based methods. Performing point-based backup is a fundamental operation in state-of-the-art algorithms. We showthat even a single backup step in the multi-agent settingis NP-Complete. Despite this negative worst-case result, wepresent an efficient and scalable optimal algorithm as well asa principled approximation scheme. The optimal algorithmexploits recent advances in the weighted CSP literature toovercome the complexity of the backup operation. The polytime approximation scheme provides a constant factor approximation guarantee based on the number of belief points.In experiments on standard domains, the optimal approachprovides significant speedup (up to 2 orders of magnitude)over the previous best optimal algorithm and is able to increase the number of belief points by more than a factor of3. The approximation scheme also works well in practice,providing near-optimal solutions to the backup problem. Point-Based Backup for Decentralized POMDPs: Complexity and New Algorithms
We present a fully distributed multi-agent planning algorithm.Our methodology uses distributed constraint satisfaction tocoordinate between agents, and local planning to ensure theconsistency of these coordination points. To solve the distributedCSP efficiently, we must modify existing methods to takeadvantage of the structure of the underlying planning problem. Inmulti-agent planning domains with limited agent interaction, ouralgorithm empirically shows scalability beyond state of the artcentralized solvers. Our work also provides a novel, real-worldsetting for testing and evaluating distributed constraint satisfactionalgorithms in structured domains and illustrates how existingtechniques can be altered to address such structure. A General, Fully Distributed Multi-Agent Planning Algorithm
We consider the problem of coordinating a team of agentsengaged in executing a set of inter-dependent, geographicallydispersed tasks in an oversubscribed and uncertain environment. In such domains, where there are sequence-dependentsetup activities (e.g., travel), we argue that there is inherent leverage to having agents maintain advance schedules.In the distributed problem solving setting we consider, eachagent begins with a task itinerary, and, as execution unfolds and dynamics ensue (e.g., tasks fail, new tasks are discovered, etc.), agents must coordinate to extend and revisetheir plans accordingly. The team objective is to maximizethe utility accrued from executed actions over a given timehorizon. Our approach to solving this problem is based ondistributed management of agent schedules. We describe anagent architecture that uses the synergy between intra-agentscheduling and inter-agent coordination to promote task allocation decisions that minimize travel time and maximizetime available for utility-acrruing activities. Experimentalresults are presented that compare our agent’s performanceto that of an agent using an intelligent dispatching strategypreviously shown to outperform our approach on synthetic,stateless, utility maximization problems. Across a range ofproblems involving a mix of situated and non-situated tasksour advance scheduling approach dominates this same dispatch strategy. Finally, we report performance results withan extension of the system on a limited set of field test experiments. Distributed Coordination of Mobile Agent Teams: The Advantage of Planning Ahead
We present a new approach for finding contingent plans with loops andbranches in situations where there is uncertainty instate properties and object quantities, but lack of probabilisticinformation about these uncertainties. We use a state abstractiontechnique from static analysis of programs which uses 3-valued logicto compactly represent belief states with unbounded numbers ofobjects. Our approach for finding plans is to incrementallygeneralize and merge input example plans which can be generated byclassical planners. The expressiveness and scope of this approach aredemonstrated using experimental results on common benchmark domains. Merging Example Plans into Generalized Plans for Non-deterministic Environments
The Affine ADD (AADD) is an extension of the Algebraic Decision Diagram (ADD) that compactly represents context-specific,additive and multiplicative structure in functions from a discretedomain to a real-valued range. In this paper, we introduce a novelalgorithm for efficiently finding AADD approximations that we useto develop the MADCAP algorithm for AADD-based structuredapproximate dynamic programming (ADP) with factored MDPs.MADCAP requires less time and space to achieve comparable orbetter approximate solutions than the current state-of-the-art ADD-based ADP algorithm of APRICODD and can provide approximatesolutions for problems with context-specific, additive and multiplicative structure on which APRICODD runs out of memory. Approximate Dynamic Programming with Affine ADDs
Partially Observable Markov Decision Process (POMDP) isa popular framework for planning under uncertainty in partially observable domains. Yet, the POMDP model is risk-neutral in that it assumes that the agent is maximizing theexpected reward of its actions. In contrast, in domains likefinancial planning, it is often required that the agent decisions are risk-sensitive (maximize the utility of agent actions, for non-linear utility functions). Unfortunately, existing POMDP solvers cannot solve such planning problems exactly. By considering piecewise linear approximations of utility functions, this paper addresses this shortcoming in three contributions: (i) It defines the Risk-SensitivePOMDP model; (ii) It derives the fundamental propertiesof the underlying value functions and provides a functionalvalue iteration technique to compute them exactly and (c) Itproposes an efficient procedure to determine the dominatedvalue functions, to speed up the algorithm. Our experimentsshow that the proposed approach is feasible and applicableto realistic financial planning domains. Risk-Sensitive Planning in Partially Observable Environments