Full Papers

Session 1A – Innovative Applications
Session 1B – Teamwork I
Session 1C – Learning I
Session 1D – Social Choice I
Session 1E – Game Theory I
Session 1F – Planning
Session 2A – Virtual Agents
Session 2B – Distributed Problem Solving
Session 2C – Learning II
Session 2D – Social Choice II
Session 2E – Game Theory II
Session 2F – Knowledge Representation & Reasoning
Session 3A – Robotics I
Session 3C – Human-agent Interaction
Session 3D – Economies & Markets I
Session 3E – Game Theory III
Session 3F – Agent-based Software Development
Session 4A – Robotics II
Session 4B – Agent Societies
Session 4C – Argumentation & Negotiation
Session 4D – Economies & Markets II
Session 4E – Game Theory IV
Session 4F – Logics for Agency
Session 5A – Robotics III
Session 5B – Teamwork II
Session 5C – Emergence
Session 5D – Auction & Mechanism Design
Session 5E – Game & Agent Theories
Session 5F – Logic and Verification

Session 1A – Innovative Applications

1A_1

While three deployed applications of game theory for security have recently been reported at AAMAS, we as a community remain in the early stages of these deployments; there is a continuing need to understand the core principles for innovative security applications of game theory. Towards that end, this paper presents PROTECT, a game-theoretic system deployed by the United States Coast Guard (USCG) in the port of Boston for scheduling their patrols. USCG has termed the deployment of PROTECT in Boston a success, and efforts are underway to test it in the port of New York, with the potential for nationwide deployment.

PROTECT is premised on an attacker-defender Stackelberg game model and offers five key innovations. First, this system is a departure from the assumption of perfect adversary rationality noted in previous work, relying instead on a quantal response (QR) model of the adversary's behavior --- to the best of our knowledge, this is the first real-world deployment of the QR model. Second, to improve PROTECT's efficiency, we generate a compact representation of the defender's strategy space, exploiting equivalence and dominance. Third, we show how to practically model a real maritime patrolling problem as a Stackelberg game. Fourth, our experimental results illustrate that PROTECT's QR model more robustly handles real-world uncertainties than a perfect rationality model. Finally, in evaluating PROTECT, this paper for the first time provides real-world data: (i) comparison of human-generated vs PROTECT security schedules, and (ii) results from an Adversarial Perspective Team's (human mock attackers) analysis. PROTECT: A Deployed Game Theoretic System to Protect the Ports of the United States Eric Shieh, Bo Anhas 3 papers, Rong Yanghas 5 papers, Milind Tambehas 13 papers, Craig Baldwin, Joseph DiRenzo, Ben Maule, Garrett Meyer

1A_2

This paper describes an innovative multiagent system called SAVES with the goal of conserving energy in commercial buildings. We specifically focus on an application to be deployed in an existing university building that provides several key novelties: (i) jointly performed with the university facility management team, SAVES is based on actual occupant preferences and schedules, actual energy consumption and loss data, real sensors and hand-held devices, etc.; (ii) it addresses novel scenarios that require negotiations with groups of building occupants to conserve energy; (iii) it focuses on a non-residential building, where human occupants do not have a direct financial incentive in saving energy and thus requires a different mechanism to effectively motivate occupants; and (iv) SAVES uses a novel algorithm for generating optimal MDP policies that explicitly consider multiple criteria optimization (energy and personal comfort) as well as uncertainty over occupant preferences when negotiating energy reduction - this combination of challenges has not been considered in previous MDP algorithms.
In a validated simulation testbed, we show that SAVES substantially reduces the overall energy consumption compared to the existing control method while achieving comparable average satisfaction levels for occupants. As a real-world test, we provide results of a trial study where SAVES is shown to lead occupants to conserve energy in real buildings. SAVES: A Sustainable Multiagent Application to Conserve Building Energy Considering Occupants Jun-young Kwakhas 3 papers, Pradeep Varakanthamhas 6 papers, Rajiv Maheswaranhas 7 papers, Milind Tambehas 13 papers, Farrokh Jazizadehhas 2 papers, Geoffrey Kavulyahas 2 papers, Laura Kleinhas 2 papers, Burcin Becerik-Gerberhas 2 papers, Timothy Hayeshas 2 papers, Wendy Woodhas 2 papers

1A_3

Cyber security is increasingly important for defending computer systems from loss of privacy or unauthorised use. One important aspect is threat analysis - how does an attacker infiltrate a system and what do they want once they are inside. This paper considers the problem of Active Malware Analysis, where we learn about the human or software intruder by actively interacting with it with the goal of learning about its behaviours and intentions, whilst at the same time that intruder may be trying to avoid detection or showing those behaviours and intentions. This game-theoretic active learning is then used to obtain a behavioural clustering of malware, an important contribution for both understanding malware at a high level and more crucially, for the deployment of effective anti-malware defences. This paper makes the following contributions: (i) A formal definition of the game-theoretic active malware analysis problem; (ii) A fast algorithm for learning about a malware in the active analysis problem which utilises the concept of reducing entropy in the beliefs about the malware; (iii) A virtual machine based agent architecture for the implementation of the active malware analysis problem and (iv) A behaviour based clustering of malware behaviour which is shown to be more accurate than a similar clustering using only passive information about the malware. Active Malware Analysis using Stochastic Games Simon Williamsonhas 2 papers, Pradeep Varakanthamhas 6 papers, Debin Gao, Ong Chen Hui

1A_4

Contemporary maritime piracy presents a significant threat to the global shipping industry, with annual costs estimated at up to US\$12bn. To address the threat, commanders and policymakers need new data-driven decision-support tools that will allow them to plan and execute counter-piracy activities most effectively. So far, however, the provision of such tools has been very limited. To fill this gap, we have employed the multi-agent approach and developed a novel suite of computational tools and techniques for operational management of counter-piracy operations. A comprehensive agent-based simulation enables the stakeholders to assess the efficiency of a range of piracy counter-measures, including recommended transit corridors, escorted convoys, group transit schemes, route randomization and navy patrol deployments. Decision-theoretic and game-theoretic optimization techniques further assist in discovering counter-measure configurations that yield the best trade-off between transportation security and cost. We demonstrate our approach on two case studies based on the problems and solutions currently explored by the maritime security community. Our work is the first integrated application of agent-based techniques to high-seas maritime security and opens a wide range of directions for follow-up research and development. Agents vs. Pirates: Multi-agent Simulation and Optimization to Fight Maritime Piracy Michal Jakobhas 2 papers, Ondřej Vanĕkhas 2 papers, Ondřej Hrstka, Michal Pĕchoučekhas 6 papers

1A_5

Nearly 20% of total energy consumption in the United States is accounted for in heating, ventilation, and air conditioning (HVAC) systems. Smart sensing and adaptive energy management agents can greatly decrease the energy usage of HVAC systems in many building applications, for example by enabling the operator to shut off HVAC to unoccupied rooms. We implement a multi-modal sensor agent that is non-intrusive and low-cost, combining information such as motion detection, CO2 reading, sound level, ambient light, and door state sensing. We show that in our live testbed at the USC campus, these sensor agents can be used to accurately estimate the number of occupants in each room using machine learning techniques, and that these techniques can also be applied to predict future occupancy by creating agent models of the occupants. These predictions will be used by control agents to enable the HVAC system increase its efficiency by continuously adapting to occupancy forecasts of each room. Improving Building Energy Efficiency with a Network of Sensing, Learning and Prediction Agents Sunil Mamidi, Yu-Han Changhas 3 papers, Rajiv Maheswaranhas 7 papers

Session 1B – Teamwork I

1B_1

In this paper, we propose to guide reinforcement learning (RL) with expert coordination knowledge for multi-agent problems managed by a central controller. The aim is to learn to use expert coordination knowledge to restrict the joint action space and to direct exploration towards more promising states, thereby improving the overall learning rate. We model such coordination knowledge as constraints and propose a two-level RL system that utilizes these constraints for online applications. Our declarative approach towards specifying coordination in multi-agent learning allows knowledge sharing between constraints and features (basis functions) for function approximation. Results on a soccer game and a tactical real-time strategy game show that coordination constraints improve the learning rate compared to using only unary constraints. The two-level RL system also outperforms existing single-level approach that utilizes joint action selection via coordination graphs. Coordination Guided Reinforcement Learning Qiangfeng Peter Lau, Mong Li Lee, Wynne Hsu

1B_2

We consider coalition formation problems for agents with an underlying \emph{synergistic graph}, where edges between agents represent some vital synergistic link, such as communication, trust, or physical constraints. A coalition is infeasible if its members do not form a connected subgraph, meaning parts of the coalition are isolated from others. Current state-of-the-art coalition formation algorithms are not designed for problems over synergistic graphs. They assume that \emph{all} coalitions are feasible and so involve redundant computation when this is not the case.
Accordingly, we propose algorithms, namely D-SlyCE and DyCE, to enumerate all feasible coalitions in a distributed fashion and find the optimal feasible coalition structure respectively. When evaluated on a variety of synergistic graphs, D-SlyCE is up to 660 times faster while DyCE is up to 7x$10^4$ times faster than the state-of-the-art algorithms. For particular classes of graphs, D-SlyCE is the first to enumerate valid coalition values for up to 50 agents and DyCE is the first algorithm to find the optimal coalition structure for up to 30 agents in minutes as opposed to months for previous algorithms. On Coalition Formation with Sparse Synergies Thomas Voice, Sarvapali Ramchurnhas 2 papers, Nick Jenningshas 11 papers

1B_3

In a wide range of emerging applications, from disaster management to intelligent sensor networks, teams of software agents can be deployed to effectively solve complex distributed problems. To achieve this, agents typically need to communicate locally sensed information to each other. However, in many settings, there are heavy constraints on the communication infrastructure, making it infeasible for every agent to broadcast all relevant information to everyone else. To address this challenge, we investigate how agents can make good local decisions about what information to send to a set of communication channels with limited bandwidths such that the overall system utility is maximised. Specifically, to solve this problem efficiently in large-scale systems with hundreds or thousands of agents, we develop a novel decentralised algorithm. This combines multi-agent learning techniques with fast decision-theoretic reasoning mechanisms that predict the impact a single agent has on the entire system. We show empirically that our algorithm consistently achieves 85% of a hypothetical centralised optimal strategy with full information, and that it significantly outperforms a number of baseline benchmarks (by up to 600%).
Decentralised Channel Allocation and Information Sharing for Teams of Cooperative Agents Sebastian Steinhas 2 papers, Simon Williamsonhas 2 papers, Nick Jenningshas 11 papers

1B_4

In many real-life networks, such as urban structures, protein interactions and social networks, one of the key issues is to measure the centrality of nodes, i.e. to determine which nodes and edges are more central to the functioning of the entire network than others. In this paper we focus on \emph{betweenness centrality} - a metric based on which the centrality of a node is measured involving the number of shortest paths that pass through that node. This metric has been shown to be well suited for many, often complex, networks. In itsstandard form, the betweenness centrality, just like other centrality metrics, evaluates nodes based on their individual contributions to the functioning of the network. For instance, the importance of an intersection in a road network can be computed as the difference between the full capacity of this network and its capacity when the intersection is completely shut down. However, as recently argued in the literature, such an approach is inadequate for many real-life applications, as, for example, multiple nodes can fail simultaneously. Thus, what would be desirable is to refine the existing centrality metrics such that they take into account not only the functioning of nodes as individual entities but also as members of groups of nodes. One recently-proposed way of doing this is based on the \emph{Shapley Value} - a solution concept in cooperative game theory that measures in a fair way the contributions of players to all the coalitions that they could possibly participate in. Although this approach has been used to extend various centrality metrics, such an extension to betweenness centrality is yet to be developed. The main challenge when developing such a refinement is to tackle the computational complexity; the Shapely Value generally requires an exponential number of operations, making its use limited to a small number of player (or nodes in our context). Against this background, our main contribution in this paper is to refine the betweenness centrality metric based on the Shapley Value: we develop an algorithm for computing this new metric, and show that it has the same complexity as the best known algorithm due to Brandes to compute the standard betweenness centrality (i.e., polynomial in the size of the network). Finally, we show that our results can be extended to another important centrality metric called stress centrality. A New Approach to Betweenness Centrality Based on the Shapley Value Piotr Szczepański, Tomasz Michalak, Talal Rahwan

1B_5

Many multi-agent applications may involve a notion of spatial coherence. For instance, simulations of virtual agents often need to model a coherent group or crowd. Alternatively, robots may prefer to stay within a pre-specified communication range. This paper proposes an extension of a decentralized, reactive collision avoidance framework, which defines obstacles in the velocity space, known as Velocity Obstacles (VOs), for coherent groups of agents. The extension, referred to in this work as a Loss of Communication Obstacle (LOCO), aims to maintain proximity among agents by imposing constraints in the velocity space and restricting the set of feasible controls. If the introduction of LOCOs results in a problem that is too restrictive, then the proximity constraints are relaxed in order to maintain collision avoidance. If agents break their proximity constraints, a method is applied to reconnect them. The approach is fast and integrates nicely with the Velocity Obstacle framework. It yields improved coherence for groups of robots connected through an input constraint graph that are moving with constant velocity. Simulated environments involving a single team moving among static obstacles, as well as multiple teams operating in the same environment, are considered in the experiments and evaluated for collisions, computational cost and proximity constraint maintenance. The experiments show that improved coherence is achieved while maintaining collision avoidance, at a small computational cost and path quality degradation. Maintaining Team Coherence under the Velocity Obstacle Framework Andrew Kimmel, Andrew Dobson, Kostas Bekris

Session 1C – Learning I

1C_1

Recent advances in reinforcement learning have yielded several PAC-MDP algorithms that, using the principle of optimism in the face of uncertainty, are guaranteed to act near-optimally with high probability on all but a polynomial number of samples. Unfortunately, many of these algorithms, such as R-MAX, perform poorly in practice because their initial exploration in each state, before the associated model parameters have been learned with confidence, is random. Others, such as \emph{Model-Based Interval Estimation} (MBIE) have weaker sample complexity bounds and require careful parameter tuning. This paper proposes a new PAC-MDP algorithm called \emph{V-MAX} designed to address these problems. By restricting its optimism to future visits, V-MAX can exploit its experience early in learning and thus obtain more cumulative reward than R-MAX. Furthermore, doing so does not compromise the quality of exploration, as we prove bounds on the sample complexity of V-MAX that are identical to those of R-MAX. Finally, we present empirical results in two domains demonstrating that V-MAX can substantially outperform R-MAX and match or outperform MBIE while being easier to tune, as its performance is invariant to conservative choices of its primary parameter. V-MAX: Tempered Optimism for Better PAC Reinforcement Learning Karun Rao, Shimon Whiteson

1C_2

Although Reinforcement Learning (RL) has been successfully deployed in a variety of tasks, learning speed remains a fundamental problem for applying RL in complex environments. Transfer learning aims to ameliorate this shortcoming by speeding up learning through the adaptation of previously learned behaviors in similar tasks. Transfer techniques often use an inter-task mapping, which determines how a pair of tasks are related. Instead of relying on a hand-coded inter-task mapping, this paper proposes a novel transfer learning method capable of autonomously creating an inter-task mapping by using a novel combination of sparse coding, sparse projection learning and sparse Gaussian processes. We also propose two new transfer algorithms (\emph{TrLSPI} and \emph{TrFQI}) based on least squares policy iteration and fitted-Q-iteration. Experiments not only show successful transfer of information between similar tasks, inverted pendulum to cart pole, but also between two very different domains: mountain car to cart pole. This paper empirically shows that the learned inter-task mapping can be successfully used to (1) improve the performance of a learned policy on a fixed number of samples, (2) reduce the learning times needed by the algorithms to converge to a policy on a fixed number of samples, and (3) converge faster to a near-optimal policy given a large number of samples. Reinforcement Learning Transfer via Sparse Coding Haitham Bou Ammar, Karl Tuylshas 5 papers, Matthew Taylorhas 2 papers, Kurt Driessen, Gerhard Weisshas 2 papers

1C_3

Understanding how we are able to perform a diverse set of complex tasks is a central question for the Artificial Intelligence community. A popular approach is to use temporal abstraction as a framework to capture the notion of subtasks. However, this transfers the problem to finding the right subtasks, which is still an open problem. Existing approaches for subtask generation require too much knowledge of the environment, and the abstractions they create can overwhelm the agent. We propose a simple algorithm inspired by small world networks to learn subtasks while solving a task that requires virtually no information of the environment. Additionally, we show that the subtasks we learn can be easily composed by the agent to solve any other task; more formally, we prove that any task can be solved using only a logarithmic combination of these subtasks and primitive actions. Experimental results show that the subtasks we generate outperform other popular subtask generation schemes on standard domains. Learning in a Small World Arun Tejasvi Chaganty, Prateek Gaur, Balaraman Ravindran

1C_4

Learning in multi-agent settings has recently garnered much interest, the result of which has been the development of somewhat effective multi-agent learning (MAL) algorithms for repeated normal-form games. However, general-purpose MAL algorithms for richer environments, such as general-sum repeated stochastic (Markov) games (RSGs), are less advanced. Indeed, previously created MAL algorithms for RSGs are typically successful only when the behavior of associates meets specific game theoretic assumptions and when the game is of a particular class (such as zero-sum games). In this paper, we present a new algorithm, called Pepper, that can be used to extend MAL algorithms designed for repeated normal-form games to RSGs. We demonstrate that Pepper creates a family of new algorithms, each of whose asymptotic performance in RSGs is reminiscent of its asymptotic performance in related repeated normal-form games. We also show that some algorithms formed with Pepper outperform existing algorithms in an interesting RSG.
Just Add Pepper: Extending Learning Algorithms for Repeated Matrix Games to Repeated Markov Games Jacob Crandall

1C_5

Recent work has defined an optimal reward problem (ORP) in which an agent designer, with an objective reward function that \emph{evaluates} an agent's behavior, has a choice of what reward function to build into a learning or planning agent to \emph{guide} its behavior. Existing results on ORP show \emph{weak mitigation} of limited computational resources, i.e., the existence of reward functions so that agents when guided by them do better than when guided by the objective reward function. These existing results ignore the cost of finding such good reward functions. We define a nested optimal reward and control architecture that achieves \emph{strong mitigation} of limited computational resources. We show empirically that the designer is better off using a new architecture that spends some of its limited resources learning a good reward function instead of using all of its resources to optimize its behavior with respect to the objective reward function. Strong Mitigation: Nesting Search for Good Policies Within Search for Good Reward Jeshua Bratman, Satinder Singhhas 3 papers, Richard Lewis, Jonathan Sorg

Session 1D – Social Choice I

1D_1

This paper considers randomized strategyproof approximations to distance rationalizable voting rules. It is shown that the \emph{Random Dictator} voting rule (return the top choice of a random voter) nontrivially approximates a large class of distances with respect to unanimity. Any randomized voting rule that deviates too greatly from the Random Dictator voting rule is shown to obtain a trivial approximation (i.e., equivalent to ignoring the voters' votes and selecting an alternative uniformly at random).
The outlook for consensus classes, other than unanimity is bleaker. This paper shows that for a large number of distance rationalizations, with respect to the majority and Condorcet consensus classes that no strategyproof randomized rule can asymptotically outperform uniform random selection of an alternative. This paper also shows that veto cannot be approximated nontrivially when approximations are measured with respect to minimizing the number of vetoes an alternative receives. Strategyproof Approximations of Distance Rationalizable Voting Rules Travis Servicehas 2 papers, Julie Adamshas 2 papers

1D_2

We study elections in which voters may submit partial ballots consisting of truncated lists: each voter ranks some of her top candidates (and possibly some of her bottom candidates) and is indifferent among the remaining ones. Holding elections with such votes requires adapting classical voting rules (which expect complete rankings as input) and these adaptations create various opportunities for candidates who want to increase their chances of winning. We provide complexity results regarding planning various kinds of campaign in such settings, and we study the complexity of the possible winner problem for the case of truncated votes. Campaigns for Lazy Voters: Truncated Ballots Dorothea Baumeisterhas 2 papers, Piotr Faliszewski, Jérôme Langhas 3 papers, Jörg Rothehas 2 papers

1D_3

We study the problem of computing possible and necessary winners for partially specified weighted and unweighted tournaments. This problem arises naturally in elections with incompletely specified votes, partially completed sports competitions, and more generally in any scenario where the outcome of some pairwise comparisons is not yet fully known. We specifically consider a number of well-known solution concepts---including the uncovered set, Borda, ranked pairs, and maximin---and show that for most of them possible and necessary winners can be identified in polynomial time. These positive algorithmic results stand in sharp contrast to earlier results concerning possible and necessary winners given partially specified preference profiles. Possible and Necessary Winners of Partial Tournaments Haris Azizhas 3 papers, Markus Brill, Felix Fischer, Paul Harrensteinhas 2 papers, Jérôme Langhas 3 papers, Hans Georg Seedig

1D_4

This paper considers the communication complexity of approximating common voting rules. Both upper and lower bounds are presented. For $n$ voters and $m$ alternatives. It is shown that for all $\epsilon \in (0,1)$, the communication complexity of obtaining a $1 - \epsilon$ approximation to Borda is $O(\log(\frac{1}{\epsilon}) nm)$. A lower bound of $\Omega(nm)$ is provided for small values of $\epsilon$. The communication complexity of computing the true Borda winner is $\Omega(nm\log(m))$. Thus, in the case of Borda, one can obtain arbitrarily good approximations with less communication overhead than is required to compute the true Borda winner.
For other voting rules, no such $1 \pm \epsilon$ approximation scheme exists. In particular, it is shown that the communication complexity of computing any constant factor approximation, $\rho$, to Bucklin is $\Omega(\frac{nm}{\rho^2})$. Conitzer and Sandholm show that the communication complexity of computing the true Bucklin winner is $O(nm)$. However, we show that for all $\delta \in (0,1)$, the communication complexity of computing a $m^{\delta}$ approximate winner in Bucklin elections is $O(nm^{1-\delta}\log(m))$. For $\delta \in (\frac{1}{2}, 1)$ a lower bound of $\Omega( nm^{1-2\delta} )$ is also provided.
Similar lower bounds are presented on the communication complexity of computing approximate winners in Copeland elections. Communication Complexity of Approximating Voting Rules Travis Servicehas 2 papers, Julie Adamshas 2 papers

Session 1E – Game Theory I

1E_1

In this paper, we examine \emph{hedonic coalition formation games} in which each player's preferences over partitions of players depend only on the members of his coalition. We present three main results in which restrictions on the preferences of the players guarantee the existence of stable partitions for various notions of stability. The preference restrictions pertain to \emph{top responsiveness} and \emph{bottom responsiveness} which model optimistic and pessimistic behavior of players respectively. The existence results apply to natural subclasses of \emph{additively separable hedonic games} and \emph{hedonic games with $B$-preferences}. It is also shown that our existence results cannot be strengthened to the case of stronger known stability concepts. Existence of Stability in Hedonic Coalition Formation Games Haris Azizhas 3 papers, Florian Brandl

1E_2

We introduce a measure for the level of stability against coalitional deviations, called \emph{stability scores}, which generalizes widely used notions of stability in non-cooperative games. We use the proposed measure to compare various Nash equilibria in congestion games, and to quantify the effect of game parameters on coalitional stability. For our main results, we apply stability scores to analyze and compare the Generalized Second Price (GSP) and Vickrey-Clarke-Groves (VCG) ad auctions. We show that while a central result of the ad auction literatures is that the GSP and VCG auctions implement the same outcome in one of the equilibria of GSP, the GSP outcome is far more stable. Finally, a modified version of VCG is introduced, which is group strategy-proof, and thereby achieves the highest possible stability score. Stablity Scores: Measuring Coalitional Stability Michal Feldmanhas 2 papers, Reshef Meir, Moshe Tennenholtzhas 2 papers

1E_3

In many real-world settings, the structure of the environment constrains the formation of coalitions among agents. Therefore, examining the stability of formed coalition structures in such settings is of natural interest. We address this by considering core-stability within various models of cooperative games with structure. First, we focus on characteristic function games defined on graphs that determine feasible coalitions. In particular, a coalition $S$ can emerge only if $S$ is a connected set in the graph. We study the (now modified) core, in which it suffices to check only feasible deviations. Specifically, we investigate core non-emptiness as well as the complexity of computing stable configurations. We then move on to the more general class of (graph-restricted) partition function games, where the value of a coalition depends on which other coalitions are present, and provide the first stability results in this domain. Finally, we propose a ``Bayesian' extension of partition function games, in which information regarding the success of a deviation is provided in the form of a probability distribution describing the possible reactions of non-deviating agents, and provide the first core-stability results in this model also. Coalitional Stability in Structured Environments Georgios Chalkiadakishas 4 papers, Vangelis Markakis, Nick Jenningshas 11 papers

1E_4

Cooperative games with overlapping coalitions (OCF games) model scenarios where agents can distribute their resources among several tasks; each task generates a profit which may be freely divided among the agents participating in the task. The goal of this work is to initiate a systematic investigation of algorithmic aspects of OCF games. We propose a discretized model of overlapping coalition formation, where each agent $i \in N$ has a weight $w_i \in \mathbb{N}$ and may allocate an integer amount of weight to any task. Within this framework, we focus on the computation of outcomes that are socially optimal and/or stable. We discover that the algorithmic complexity of the associated problems crucially depends on the amount of resources that each agent possesses, the maximum coalition size, and the pattern of interaction among the agents. We identify several constraints that lead to tractable subclasses of OCF games, and provideefficient algorithms for games that belong to these subclasses. We supplement our tractability results by hardness proofs, which clarify the role of our constraints. Overlapping Coalition Formation Games: Charting the Tractability Frontier Yair Zickhas 2 papers, Georgios Chalkiadakishas 4 papers, Edith Elkindhas 3 papers

1E_5

A Coalition Structure Generation (CSG) problem involves partitioning a set of agents into coalitions so that the social surplus is maximized. Recently, Ohta et al. developed an efficient algorithm for solving CSG, assuming that a characteristic function is represented as a set of rules, such as marginal contribution networks (MC-nets).
In this paper, we extend the formalization of CSG in Ohta et al. so that it can handle negative value rules. Here, we assume that a characteristic function is represented by either MC-nets (without externalities) or embedded MC-nets (with externalities). Allowing negative value rules is important since it can reduce the efforts for describing a characteristic function. In particular, in many realistic situations, it is natural to assume that a coalition has negative externalities to other coalitions.
To handle negative value rules, we examine the following three algorithms: (i) a full transformation algorithm, (ii) a partial transformation algorithm, and (iii) a direct encoding algorithm. We show that the full transformation algorithm is not scalable in MC-nets (the worst-case representation size is $\Omega(n^2)$, where n is the number of agents), and does not seem to be tractable in embedded MC-nets (representation size would be $\Omega(2^n)$). In contrast, by using the partial transformation or direct encoding algorithms, an exponential blow-up never occurs even for embedded MC-nets. For embedded MC-nets, the direct encoding algorithm creates less rules than the partial transformation algorithm.
Experimental evaluations show that the direct encoding algorithm is scalable, i.e., an off-the-shelf optimization package (CPLEX) can solve problem instances with 100 agents and rules within 10 seconds. Handling Negative Value Rules in MC-net-based Coalition Structure Generation Suguru Uedahas 3 papers, Takato Hasegawa, Naoyuki Hashimoto, Naoki Ohta, Atsushi Iwasakihas 4 papers, Makoto Yokoohas 4 papers

Session 1F – Planning

1F_1

Markov Decision Processes are one of the most widely used frameworks to formulate probabilistic planning problems. Since planners are often risk-sensitive in high-stake situations, non-linear utility functions are often introduced to describe their preferences among all possible outcomes. Alternatively, risk-sensitive decision makers often require their plans to satisfy certain worst-case guarantees.
We show how to combine these two approaches by considering problems where we maximize the expected utility of the total reward subject to worst-case constraints. We generalize several existing results on the structure of optimal policies to the constrained case, both for finite and infinite horizon problems. We provide a Dynamic Programming algorithm to compute the optimal policy, and we introduce an admissible heuristic to effectively prune the search space. Finally, we use a stochastic shortest path problem on large real-world road networks to demonstrate the practical applicability of our method. Probabilistic Planning with Non-Linear Utility Functions and Worst-Case Guarantees Stefano Ermon, Carla Gomes, Bart Selman, Alexander Vladimirsky

1F_2

Multiagent planning under uncertainty has seen important progress in recent years. Two techniques, in particular, have substantially advanced efficiency and scalability of planning. Multiagent heuristic search gains traction by pruning large portions of the joint policy space deemed suboptimal by heuristic bounds. Alternatively, influence-based abstraction reformulates the search space of joint policies into a smaller space of influences, which represent the probabilistic effects that agents' policies may exert on one another. These techniques have been used independently, but never together, to solve larger problems (for Dec-POMDPs and subclasses) than previously possible. In this paper, we take the logical albeit nontrivial next step of combining multiagent A* search and influence-based abstraction into a single algorithm. The mathematical foundation that we provide, such as partially-specified influence evaluation and admissible heuristic definition, enables an initial investigation into whether the two techniques bring complementary gains. Our empirical results indicate that A* can provide significant computational savings on top of those already afforded by influence-space search, thereby bringing a significant contribution to the field of multiagent planning under uncertainty. Heuristic Search of Multiagent Influence Space Stefan Witwickihas 2 papers, Frans Oliehoekhas 2 papers, Leslie Kaelbling

1F_3

Plan generation is important in a number of agent applications, but such applications generally require elaborate domain models that include not only the definitions of the actions that an agent can perform in a given domain, but also information about the most effective ways to generate plans for the agent in that domain. Such models typically take a large amount of human effort to create.
To alleviate this problem, we have developed a hierarchical goal-based planning formalism and a planning algorithm, GDP (Goal-Decomposition Planner), that combines some aspects of both HTN planning and domain-independent planning. For example, it allows the planning agent to use domain-independent heuristic functions to guide the application of both methods and actions.
This paper describes the formalism, planning algorithm, correctness theorems, and the results of a large experimental study. The experiments show that our planning algorithm works as well as the well-known SHOP2 HTN planner, using domain models only about half the size of SHOP2's. A Hierarchical Goal-Based Formalism and Algorithm for Single-Agent Planning Vikas Shivashankar, Ugur Kuterhas 3 papers, Dana Nauhas 2 papers, Ron Alford

1F_4

We consider the problem of automated planning and control for an execution agent operating in environments that are partially-observable with deterministic exogenous events. We describe a new formalism and a new algorithm, \textsc{DiscoverHistory}, that enables our agent, DHAgent, to proactively expand its knowledge of the environment during execution by forming explanations that reveal information about the world. We describe how DHAgent uses this information to improve the projections made during planning. Finally, we present an ablation study that examines the impact of explanation generation on execution performance. The results of this study demonstrate that our approach significantly increases the goal achievement success rate of DHAgent against an ablated version that does not perform explanation. DiscoverHistory: Understanding the Past in Planning and Execution Matthew Molineaux, Ugur Kuterhas 3 papers, Matthew Klenk

1F_5

n this paper, we investigate real-time path planning in static terrain, as needed in video games. We introduce the game time model, where time is partitioned into uniform time intervals, an agent can execute one movement during each time interval, and search and movements are done in parallel. The objective is to move the agent from its start location to its goal location in as few time intervals as possible. For known terrain, we show experimentally that Time-Bounded A* (TBA*), an existing real-time search algorithm for undirected terrain, needs fewer time intervals than two state-of-the-art real-time search algorithms and about the same number of time intervals as A*. TBA*, however, cannot be used when the terrain is not known initially. For initially partially or completely unknown terrain, we thus propose a new search algorithm. Our Time-Bounded Adaptive A* (TBAA*) extends TBA* to on-line path planning with the freespace assumption by combining it with Adaptive A*. We prove that TBAA* either moves the agent from its start location to its goal location or detects that this is impossible - an important property since many existing realtime search algorithms are not able to detect efficiently that no path exists. Furthermore, TBAA* can eventually move the agent on a cost-minimal path from its start location to its goal location if it resets the agent into its start location whenever it reaches its goal location. We then show experimentally in initially partially or completely unknown terrain that TBAA* needs fewer time intervals than several state-of-the-art complete and real-time search algorithms and about the same number of time intervals as the best compared complete search algorithm, even though it has the advantage over complete search algorithms that the agent starts to move right away. Time Bounded Adaptive A* Carlos Hernández, Jorge Baier, Tansel Uras, Sven Koenig

Session 2A – Virtual Agents

2A_1

Research in the behavioral sciences suggests that emotion can serve important social functions and that, more than a simple manifestation of internal experience, emotion displays communicate one's beliefs, desires and intentions. In a recent study we have shown that, when engaged in the iterated prisoner's dilemma with agents that display emotion, people infer, from the emotion displays, how the agent is appraising the ongoing interaction (e.g., is the situation favorable to the agent? Does it blame me for the current state-of-affairs?). From these appraisals people, then, infer whether the agent is likely to cooperate in the future.
In this paper we propose a Bayesian model that captures this social function of emotion. The model supports probabilistic predictions, from emotion displays, about how the counterpart is appraising the interaction which, in turn, lead to predictions about the counterpart's intentions. The model's parameters were learned using data from the empirical study. Our evaluation indicated that considering emotion displays improved the model's ability to predict the counterpart's intentions, in particular, how likely it was to cooperate in a social dilemma. Using data from another empirical study where people made inferences about the counterpart's likelihood of cooperation in the absence of emotion displays, we also showed that the model could, from information about appraisals alone, make appropriate inferences about the counterpart's intentions. Overall, the paper suggests that appraisals are valuable for computational models of emotion interpretation. The relevance of these results for the design of multiagent systems where agents, human or not, can convey or recognize emotion is discussed. Bayesian Model of the Social Effects of Emotion in Decision-Making in Multiagent Systems Celso de Melo, Peter Carnevale, Stephen Read, Dimitrios Antos, Jonathan Gratchhas 2 papers

2A_2

Nonverbal behavior is considered critical for indicating intimacy and is important when designing a social virtual agent such as a counselor. One key research question is how properly to express intimate self-disclosure. In this paper we present an extensive study of human nonverbal behavior during intimate self-disclosure. This is an important milestone in creating a virtual counselor. A study of video interactions between human participants demonstrated that people display more head tilts and pauses when they revealed highly intimate information about themselves; they presented more head nods and eye gazes during less intimate sharing. An implementation of these behaviors in a virtual agent suggests that people tend to perceive head tilts, pauses and gaze aversion by the agent as conveying intimate self-disclosure. These findings are important for future research with virtual counselors and other social agents. Towards building a Virtual Counselor: Modeling Nonverbal Behavior during Intimate Self-Disclosure Sin-Hwa Kang, Jonathan Gratchhas 2 papers, Candy Sidner, Ron Artstein, Lixing Huang, Louis-Phillippe Morency

2A_3

In story-based games or other interactive story systems, a Drama Manager is an omniscient agent that acts to bring about a particular sequence of plot points for the user to experience. We present a Drama Manager that uses player modeling to personalize the user's story according to his or her storytelling preferences. In order to deliver personalized stories, a Drama Manager must make decisions on not only which plot points to be included into the unfolding story but also the optimal sequence of the events the user should experience. A prefix based collaborative filtering algorithm based on users' structural feedback is proposed to address the sequential selection problem. We demonstrate our system on a simple interactive story generation system based on choose-your-own-adventure stories to evaluate our algorithms. Results on human users and simulated users show that our Drama Manager is capable of capturing users' preference and generating personalized stories with high accuracy.
A Sequential Recommendation Approach for Interactive Personalized Story Generation Hong Yu, Mark Riedl

2A_4

This paper presents the intelligent virtual animals that inhabit Omosa, a virtual learning environment to help secondary school students learn how to conduct scientific inquiry and gain concepts from biology. Omosa supports multiple agents, including animals, plants, and human hunters, which live in groups of varying sizes and in a predator-prey relationship with other agent types (species). In this paper we present our generic agent architecture and the algorithms that drive all animals. We concentrate on two of our animals to present how different parameter values affect their movements and inter/intra-group interactions. Two evaluations studies are included: one to demonstrate the effect of different components of our architecture; another to provide domain expert validation of the animal behavior. Evaluating the Models & Behaviour of 3D Intelligent Virtual Animals in a Predator-Prey Relationship Deborah Richardshas 2 papers, Michael J. Jacobsonhas 2 papers, John Portehas 2 papers, Charlotte Taylorhas 2 papers, Meredith Taylorhas 2 papers, Anne Newsteadhas 2 papers, Iwan Kelaiahhas 2 papers, Nader Hannahas 2 papers

2A_5

A smile may convey different communicative intentions depending on subtle characteristics of the facial expression. Moreover, during an interaction, the expression of smile impacts on the observer's perception of both the social stance of the speaker and of the content of the talk. In this paper, we describe a perceptual study where we explore the effects of virtual characters displaying different types of smiles (namely politeness and amusement) when speaking on the user's perception. Based on the collected data, a model to automatically compute the user's potential perception of the virtual character's social stance depending on its smiling behavior and on its gender has been proposed. Model of the Perception of Smiling Virtual Character Magalie Ochs, Catherine Pelachaudhas 2 papers

Session 2B – Distributed Problem Solving

2B_1

Distributed constraint optimization problems (DCOPs) are well-suited for modeling multi-agent coordination problems where the primary interactions are between local subsets of agents. However, one limitation of DCOPs is the assumption that the constraint rewards are without uncertainty. Researchers have thus extended DCOPs to Stochastic DCOPs (SDCOPs), where rewards are sampled from known probability distribution reward functions, and introduced algorithms to find solutions with the largest expected reward. Unfortunately, such a solution might be very \emph{risky}, that is, very likely to result in a poor reward. Thus, in this paper, we make three contributions: (1) we propose a stricter objective for SDCOPs, namely to find a solution with the most stochastically dominating probability distribution reward function; (2) we introduce an algorithm to find such solutions; and (3) we show that stochastically dominating solutions can indeed be less risky than expected reward maximizing solutions. Stochastic Dominance in Stochastic DCOPs for Risk Sensitive Applications Duc Thien Nguyen, William Yeohhas 3 papers, Hoong Chuin Lauhas 2 papers

2B_2

Distributed Constraint Optimization Problems (DCOPs) are NP-hard and therefore the number of studies that consider incomplete algorithms for solving them is growing. Specifically, the Max-sum algorithm has drawn attention in recent years and has been applied to a number of realistic applications. Unfortunately, in many cases Max-sum does not produce high quality solutions. More specifically, when problems include cycles of various sizes in the factor graph upon which Max-sum performs, the algorithm does not converge and the states that it visits are of low quality.
In this paper we advance the research on incomplete algorithms for DCOPs by: (1) Proposing a version of the Max-sum algorithm that operates on an alternating directed acyclic graph (Max-sum\_AD), which guarantees convergence in linear time. (2) Identifying major weaknesses of Max-sum and Max-sum\_AD that cause inconsistent costs/utilities to be propagated and affect the assignment selection. (3) Solving the identified problems by introducing value propagation to Max-sum\_AD. Our empirical study reveals a large improvement in the quality of the solutions produced by Max-sum\_AD with value propagation (VP), when solving problems which include cycles, compared with the solutions produced by the standard Max-sum algorithm, Bounded Max-sum and Max-sum\_AD with no value propagation. Max/Min-sum Distributed Constraint Optimization through Value Propagation on an Alternating DAG Roie Zivanhas 2 papers, Hilla Peled

2B_3

Several multiagent tasks can be formulated and solved as DCOPs. BnB-ADOPT⁺-AC is one of the most efficient algorithms for optimal DCOP solving. It is based on BnB-ADOPT, removing redundant messages and maintaining soft arc consistency during search. In this paper, we present several improvements for this algorithm, namely (i) a better implementation (ii) processing exactly simultaneous deletions and (iii) searching on arc consistent cost functions. We present empirical results showing the benefits of these improvements on several benchmarks. Improving BnB-ADOPT⁺-AC Patricia Gutierrezhas 3 papers, Pedro Meseguerhas 2 papers

2B_4

Distribution network operators face a number of challenges; capacity constrained networks, and balancing electricity demand with generation from intermittent renewable resources. Thus, there is an increasing need for scalable approaches to facilitate optimal dispatch in the distribution network. To this end, we cast the optimal dispatch problem as a decentralised agent-based coordination problem and formalise it as a DCOP. We show how this can be decomposed as a factor graph and solved in a decentralised manner using algorithms based on the generalised distributive law; in particular, the max-sum algorithm. We go on to show that max-sum applied na\"{\i}vely in this setting performs a large number of redundant computations. To address this issue, we present a novel decentralised message passing algorithm using dynamic programming that outperforms max-sum by pruning the search space. We empirically evaluate our algorithm using real data, showing that it outperforms (in terms of computational time and total size of messages sent) both a centralised approach, which uses IBM's ILOG CPLEX 12.2, and max-sum, for large networks. Optimal Decentralised Dispatch of Embedded Generation in the Smart Grid Sam Millerhas 2 papers, Sarvapali Ramchurnhas 2 papers, Alex Rogershas 9 papers

2B_5

Real life coordination problems are characterised by stochasticity and a lack of \emph{a priori} knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely accepted framework for modelling decentralised coordination problems, assumes perfect knowledge, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB-DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB-DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the \emph{a priori} unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon.
We propose \textsc{Heist}, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. \textsc{Heist} solves MAB-DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that \textsc{Heist} outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB-DCOPs in experimental settings. DCOPs and Bandits: Exploration and Exploitation in Decentralised Coordination Ruben Stranders, Long Tran-Thanh, Francesco Maria Delle Favehas 2 papers, Alex Rogershas 9 papers, Nick Jenningshas 11 papers

Session 2C – Learning II

2C_1

Solving complex but structured problems in a decentralized manner via multiagent collaboration has received much attention in recent years. This is natural, as on one hand, multiagent systems usually possess a structure that determines the allowable interactions among the agents; and on the other hand, the single most pressing need in a cooperative multiagent system is to coordinate the local policies of autonomous agents with restricted capabilities to serve a system-wide goal. The presence of uncertainty makes this even more challenging, as the agents face the additional need to learn the unknown environment parameters while forming (and following) local policies in an online fashion. In this paper, we provide the first Bayesian reinforcement learning (BRL) approach for distributed coordination and learning in a cooperative multiagent system by devising two solutions to this type of problem. More specifically, we show how the Value of Perfect Information (VPI) can be used to perform efficient decentralised exploration in both model-based and model-free BRL, and in the latter case, provide a closed form solution for VPI, correcting a decade old result by Dearden, Friedman and Russell. To evaluate these solutions, we present experimental results comparing their relative merits, and demonstrate empirically that both solutions outperform an existing multiagent learning method, representative of the state-of-the-art. Decentralized Bayesian Reinforcement Learning for Online Agent Collaboration Luke Teacy, Georgios Chalkiadakishas 4 papers, Alessandro Farinellihas 2 papers, Alex Rogershas 9 papers, Nick Jenningshas 11 papers, Sally McClean, Gerard Parr

2C_2

Coevolution is a natural approach to evolve teams of agents which must cooperate to achieve some system objective. However, in many coevolutionary approaches, credit assignment is often subjective and context dependent, as the fitness of an individual agent strongly depends on the actions of the agents with which it collaborates. In order to alleviate this problem, we introduce a cooperative coevolutionary algorithm which biases the evolutionary search as well as shapes agent fitness functions to reward behavior that benefits the system. More specifically, we bias the search using a hall of fame approximation of optimal collaborators, and we shape the agent fitness using the difference objective functions. Our results show that shaping agent fitness with the difference objective improves system performance by up to 50%, and adding an additional fitness bias can improve performance by up to 75%. Shaping Fitness Functions for Coevolving Cooperative Multiagent Systems Mitchell Colby, Kagan Tumerhas 2 papers

2C_3

Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multi-agent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together.
However, a limitation of existing proofs is the assumption that the potential of a state does not change dynamically during the learning. This assumption often is broken, especially if the reward-shaping function is generated automatically.
In this paper we prove and demonstrate a method of extending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi-agent case. Dynamic Potential-Based Reward Shaping Sam Devlin, Daniel Kudenko

2C_4

Factored models of multiagent systems address the complexity of joint behavior by exploiting locality in agent interactions. \emph{History-dependent graphical multiagent models} (hGMMs) further capture dynamics by conditioning behavior on history. The challenges of modeling real human behavior motivated us to extend the hGMM representation by distinguishing two types of agent interactions. This distinction opens the opportunity for learning dependence networks that are different from given graphical structures representing observed agent interactions. We propose a greedy algorithm for learning hGMMs from time-series data, inducing both graphical structure and parameters. Our empirical study employs human-subject experiment data for a dynamic consensus scenario, where agents on a network attempt to reach a unanimous vote. We show that the learned hGMMs directly expressing joint behavior outperform alternatives in predicting dynamic human voting behavior, and end-game vote results. Analysis of learned graphical structures reveals patterns of action dependence not directly reflected in the original experiment networks. Learning and Predicting Dynamic Networked Behavior with Graphical Multiagent Models Quang Duong, Michael Wellmanhas 2 papers, Satinder Singhhas 3 papers, Michael Kearns

Session 2D – Social Choice II

2D_1

The Internet Engineering Task Force develops and promotes Internet standards like TCP/IP. The chair of the Task Force is chosen by an election which starts with a set of voters being selected at random from the electorate of volunteers. Selecting decision makers by lottery like this has a long and venerable history, having been used in Athenian democracy over two millennia ago, as well as for over 500 years from the 13th Century to elect the Doge of Venice. In this paper, we consider using such lotteries in multi-agent decision making. We study a family of voting rules called lot-based voting rules. Such rules have two steps: in the first step, $k$ votes are selected by a lottery, then in the second round (the runoff), a voting rule is applied to select the winner based on these $k$ votes. We study some normative properties of such lot-based rules. We also investigate the computational complexity of computing the winner with weighted and unweighted votes, and of computing manipulations. We show that for most lot-based voting rules winner determination and manipulation are computationally hard. Our results suggest that this general technique (using lotteries to selecting some voters randomly) may help to prevent strategic behavior of the voters from a computational point of view. Lot-based Voting Rules Toby Walsh, Lirong Xia

2D_2

In multiagent systems, social choice functions can help aggregate the distinct preferences that agents have over alternatives, enabling them to settle on a single choice. Despite the basic manipulability of all reasonable voting systems, it would still be desirable to find ways to reach a \emph{stable} result, i.e., a situation where no agent would wish to change its vote. One possibility is an iterative process in which, after everyone initially votes, participants may change their votes, one voter at a time. This technique, explored in previous work, converges to a Nash equilibrium when Plurality voting is used, along with a tie-breaking rule that chooses a winner according to a linear order of preferences over candidates.
In this paper, we both consider limitations of the iterative voting method, as well as expanding upon it. We demonstrate the significance of tie-breaking rules, showing that when using a general tie-breaking rule, no scoring rule (nor Maximin) need iteratively converge. However, using a restricted tie-breaking rule (such as the linear order rule used in previous work) does not by itself \emph{ensure} convergence. We demonstrate that many scoring rules (such as Borda) need not converge, regardless of the tie-breaking rule. On a more encouraging note, we prove that Iterative Veto does converge - but that voting rules "between" Plurality and Veto, $k$-approval rules, do not. Convergence of Iterative Voting Omer Lev, Jeffrey Rosenscheinhas 2 papers

2D_3

Complexity of voting manipulation is a prominent research topic in computational social choice. In this paper, we study the complexity of {\em optimal} manipulation, i.e., finding a manipulative vote that achieves the manipulator's goal yet deviates as little as possible from her true ranking. We study this problem for three natural notions of closeness, namely, swap distance, footrule distance, and maximum displacement distance, and a variety of voting rules, such as scoring rules, Bucklin, Copeland, and Maximin. For all three distances, we obtain poly-time algorithms for all scoring rules and Bucklin and hardness results for Copeland and Maximin. Optimal Manipulation of Voting Rules Svetlana Obraztsova, Edith Elkindhas 3 papers

2D_4

An important research topic in the field of computational social choice is the complexity of various forms of dishonest behavior, such as manipulation, control, and bribery. While much of the work on this topic assumes that the cheating party has full information about the election, recently there have been a number of attempts to gauge the complexity of non-truthful behavior under uncertainty about the voters' preferences. In this paper, we analyze the complexity of (coalitional) manipulation for the setting where there is uncertainty about the voting rule: the manipulator(s) know that the election will be conducted using a voting rule from a given list, and need to select their votes so as to succeed no matter which voting rule will eventually be chosen. We identify a large class of voting rules such that arbitrary combinations of rules from this class are easy to manipulate; in particular, we show that this is the case for single-voter manipulation and essentially all easy-to-manipulate voting rules, and for coalitional manipulation and $k$-approval. While a combination of a hard-to-manipulate rule with an easy-to-manipulate one is usually hard to manipulate -- we prove this in the context of coalitional manipulation for several combinations of prominent voting rules -- we also provide counterexamples showing that this is not always the case. Manipulation Under Voting Rule Uncertainty Edith Elkindhas 3 papers, Gábor Erdélyi

2D_5

We develop a formal model of opinion polls in elections and study how they influence the voting behaviour of the participating agents, and thereby election outcomes. This approach is particularly relevant to the study of collective decision making by means of voting in multiagent systems, where it is reasonable to assume that we can precisely model the amount of information available to agents and where agents can be expected to follow relatively simple rules when adjusting their behaviour in response to polls. We analyse two settings, one where a single agent strategises in view of a single poll, and one where multiple agents repeatedly update their voting intentions in view of a sequence of polls. In the single-poll setting we vary the amount of information a poll provides and examine, for different voting rules, when an agent starts and stops having an incentive to manipulate the election. In the repeated-poll setting, using both analytical and experimental methods, we study how the properties of different voting rules are affected under different sets of assumptions on how agents will respond to polling information. Together, our results clarify under which circumstances sharing information via opinion polls can improve the quality of election outcomes and under which circumstances it may have negative effects, due to the increased opportunities for manipulation it provides.
Voter Response to Iterated Poll Information Annemieke Reijngoud, Ulle Endriss

Session 2E – Game Theory II

2E_1

The paper introduces a class of games in extensive form where players take strategic decisions while not having access to the terminal histories of the game, hence being unable to solve it by standard backwards induction. This class of games is studied along two directions: first, by providing an appropriate refinement of the subgame perfect equilibrium concept, a corresponding extension of the backwards induction algorithm and an equilibrium existence theorem; second, by showing that these games are a well-behaved subclass of a class of games with possibly unaware players recently studied in the literature. Short Sight in Extensive Games Davide Grossi, Paolo Turrini

2E_2

The computational study of strategic interactions situations has recently deserved a lot of attention in multi-agent systems. A number of results on strategic-form games and zero-sum extensive-form games are known in the literature, while general-sum extensive-form games are not studied in depth. We focus on the problem to decide whether or not a solution is a refinement of the Nash equilibrium (NE) for extensive-form games. Refinements are needed because the NE concept is not satisfactory for this game class. While verifying whether a solution is an NE is in $\mathcal{P}$, verifying whether it is a NE refinement may be not (all the results known so far show $\mathcal{NP}$-hardness). In this paper, we provide the first positive result, showing that verifying a \emph{sequential equilibrium} with any number of agents and a \emph{quasi perfect equilibrium} with two agents are in $\mathcal{P}$. We show also that when the input is expressed in (non-perturbed) sequence form even the problem to verify a subgame perfect equilibrium is $\mathcal{NP}$-complete and that sequence form, if applicable, must be rethought to verify (and therefore to compute) an extensive-form perfect equilibrium. New Results on the Verification of Nash Refinements for Extensive-Form Games Nicola Gattihas 3 papers, Fabio Panozzohas 2 papers

2E_3

In Stackelberg games, a "leader" player first chooses a mixed strategy to commit to, then a "follower" player responds based on the observed leader strategy. Notable strides have been made in scaling up the algorithms for such games, but the problem of finding optimal leader strategies spanning multiple rounds of the game, with a Bayesian prior over unknown follower preferences, has been left unaddressed. Towards remedying this shortcoming we propose a first-of-a-kind tractable method to compute an optimal plan of leader actions in a repeated game against an unknown follower, assuming that the follower plays myopic best-response in every round. Our approach combines Monte Carlo Tree Search, dealing with leader exploration/exploitation tradeoffs, with a novel technique for the identification and pruning of dominated leader strategies. The method provably finds asymptotically optimal solutions and scales up to real world security games spanning double-digit number of rounds.
Playing Repeated Stackelberg Games with Unknown Opponents Janusz Mareckihas 2 papers, Gerry Tesauro, Richard Segal

2E_4

When a zero-sum game is played once, a risk-neutral player will want to maximize his expected outcome in that single play. However, if that single play instead only determines how much one player must pay to the other, and the same game must be played again, until either player runs out of money, optimal play may differ. Optimal play may require using different strategies depending on how much money has been won or lost. Computing these strategies is rarely feasible, as the state space is often large. This can be addressed by playing the same strategy in all situations, though this will in general sacrifice optimality. Purely maximizing expectation for each round in this way can be arbitrarily bad. We therefore propose a new solution concept that has guaranteed performance bounds, and we provide an efficient algorithm for computing it. The solution concept is closely related to the Aumann-Serrano index of riskiness, that is used to evaluate different gambles against each other. The primary difference is that instead of being offered fixed gambles, the game is adversarial. Repeated zero-sum games with budget Troels Sørensen

2E_5

Recently, there has been considerable progress towards algorithms for approximating Nash equilibrium strategies in extensive games. One such algorithm, Counterfactual Regret Minimization (CFR), has proven to be effective in two-player, zero-sum poker domains. While the basic algorithm is iterative and performs a full game traversal on each iteration, sampling based approaches are possible. For instance, chance-sampled CFR considers just a single chance outcome per traversal, resulting in faster but less precise iterations. While more iterations are required, chance-sampled CFR requires less time overall to converge. In this work, we present new sampling techniques that consider sets of chance outcomes during each traversal to produce slower, more accurate iterations. By sampling only the public chance outcomes seen by all players, we take advantage of the imperfect information structure of the game to (i) avoid recomputation of strategy probabilities, and (ii) achieve an algorithmic speed improvement, performing O($n^2$) work at terminal nodes in O($n$) time. We demonstrate that this new CFR update converges more quickly than chance-sampled CFR in the large domains of poker and Bluff. Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson, Michael Bowling

Session 2F – Knowledge Representation & Reasoning

2F_1

Memory enables past experiences to be remembered and acquired as useful knowledge to support decision making, especially when perception and computational resources are limited. This paper presents a neuropsychological-inspired dual memory model for agents, consisting of an episodic memory that records the agent's experience in real time and a semantic memory that captures factual knowledge through a parallel consolidation process. In addition, the model incorporates a natural forgetting mechanism that prevents memory overloading by removing transient memory traces. Our experimental study based on a real-time first-person-shooter video game has indicated that the memory consolidation and forgetting processes are not only able to extract valuable knowledge and regulate the memory capacity, but they can mutually improve the effectiveness of learning the knowledge for the given task in hand. Interestingly, a moderate level of forgetting may even improve the task performance rather than disadvantaging it. We suggest that the interplay between rapid memory formation, consolidation, and forgetting processes points to a practical and effective approach for learning agents to acquire and maintain useful knowledge from experiences in a scalable manner. Memory Formation, Consolidation, and Forgetting in Learning Agents Budhitama Subagdjahas 2 papers, Wenwen Wang, Ah-Hwee Tanhas 2 papers, Yuan-Sin Tan, Loo-Nin Teow

2F_2

Interactive multiagent decision making often requires to predict actions of other agents by solving their behavioral models from the perspective of the modeling agent. Unfortunately, the general space of models in the absence of constraining assumptions tends to be very large thereby making multiagent decision making intractable. One approach that can reduce the model space is to cluster \emph{behaviorally equivalent models} that exhibit identical policies for the whole planning horizon. Currently, the state-of-the art on identifying equivalence of behavioral models compares partial policy trees instead of entire trees. In this paper, we further improve the use of partial trees for the identification purpose and develop an incremental comparison strategy in order to efficiently ascertain the model equivalence. We investigate the improved approach in a well-defined probabilistic graphical model for sequential multiagent decision making - interactive dynamic influence diagrams, and evaluate its performance over multiple problem domains. Improved Use of Partial Policies for Identifying Behavioral Equivalence Yifeng Zeng, Yinghui Pan, Hua Mao, Jian Luo

2F_3

In this paper we provide a neural-symbolic framework to model, reason about and learn norms in multi-agent systems. To this purpose, we define a fragment of Input/Output (I/O) logic that can be embedded into a neural network. We extend d'Avila Garcez et al. Connectionist Inductive Learning and Logic Programming System (CILP) to translate an I/O logic program into a Neural Network (NN) that can be trained further with examples: we call this new system Normative-CILP (N-CILP). We then present a new algorithm to handle priorities between rules in order to cope with normative issues like Contrary to Duty (CTD), Priorities, Exceptions and Permissions. We illustrate the applicability of the framework on a case study based on RoboCup rules: within this working example, we compare the learning capacity of a network built with N-CILP with a non symbolic neural network, we explore how the initial knowledge impacts on the overall performance, and we test the NN capacity of learning norms, generalizing new Contrary to Duty rules from examples. Learning and Reasoning about Norms using Neural-Symbolic Systems Guido Boella, Silvano Colombo Tosatto, Artur d'Avila Garcez, Valerio Genovese, Perotti Alan, Leendert van der Torrehas 3 papers

2F_4

We investigate agent supervision, a form of customization, which constrains the actions of an agent so as to enforce certain desired behavioral specifications. This is done in a setting based on the Situation Calculus and a variant of the ConGolog programming language which allows for nondeterminism, but requires the remainder of a program after the execution of an action to be determined by the resulting situation. Such programs can be fully characterized by the set of action sequences that they generate. Hence operations like intersection and difference become natural. The main results of the paper are a characterization of the maximally permissive supervisor that minimally constrains the agent so as to enforce the desired behavioral constraints when some agent actions are uncontrollable, and a sound and complete technique to execute the agent as constrained by such a supervisor. On Supervising Agents in Situation-Determined ConGolog Giuseppe De Giacomo, Yves Lespérance, Christian Muise

2F_5

Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration.
In this paper, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform policy iteration in settings formalized by the interactive POMDP framework. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its generalization here makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling others, we ascribe nested controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We evaluate our approach on multiple problem domains, and demonstrate its properties and scalability. Generalized and Bounded Policy Iteration for Finitely-Nested Interactive POMDPs: Scaling Up Ekhlas Sonuhas 2 papers, Prashant Doshihas 3 papers

Session 3A – Robotics I

3A_1

In this paper, we study a heterogeneous robot team composed of self-assembling robots and aerial robots that cooperate with each other to carry out global tasks. We introduce \emph{supervised morphogenesis} -- an approach in which aerial robots exploit their better view of the environment to detect tasks on the ground that require self-assembly, and perform on-board simulations to determine the morphology most adequate to carry out the task. In case existing morphologies on the ground do not match those determined in simulation, aerial robots use a series of enabling mechanisms to initiate and control (hence supervise) the formation of morphologies more adequate to carry out the task. Supervised morphogenesis solely employs LEDs and camera-based local communication between the two robot types. We validate the applicability of our approach in a real-world scenario, in which ground-based robots are given the task to cross an unknown, undulated terrain by forming ad-hoc morphologies under the supervision of an aerial robot. Supervised Morphogenesis - Morphology Control of Ground-based Self-Assembling Robots by Aerial Robots Nithin Mathews, Alessandro Stranieri, Alexander Scheidler, Marco Dorigohas 2 papers

3A_2

A central problem in environmental sensing and monitoring is to classify/label the hotspots in a large-scale environmental field. This paper presents a novel \emph{decentralized active robotic exploration} (DARE) strategy for probabilistic classification/labeling of hotspots in a \emph{Gaussian process} (GP)-based field. In contrast to existing state-of-the-art exploration strategies for learning environmental field maps, the time needed to solve the DARE strategy is independent of the map resolution and the number of robots, thus making it practical for in situ, real-time active sampling. Its exploration behavior exhibits an interesting formal trade-off between that of boundary tracking until the hotspot region boundary can be accurately predicted and wide-area coverage to find new boundaries in sparsely sampled areas to be tracked. We provide a theoretical guarantee on the active exploration performance of the DARE strategy: under reasonable conditional independence assumption, we prove that it can optimally achieve two formal cost-minimizing exploration objectives based on the misclassification and entropy criteria. Importantly, this result implies that the uncertainty of labeling the hotspots in a GP-based field is greatest at or close to the hotspot region boundaries. Empirical evaluation on real-world plankton density and temperature field data shows that, subject to limited observations, DARE strategy can achieve more superior classification of hotspots and time efficiency than state-of-the-art active exploration strategies. Decentralized Active Robotic Exploration and Mapping for Probabilistic Field Classification in Environmental Sensing Kian Hsiang Lowhas 3 papers, Jie Chen, John Dolanhas 2 papers, Steve Chien, David Thompson

3A_3

Frontier-based exploration is the most common approach to exploration, a fundamental problem in robotics. In frontier-based exploration, robots explore by repeatedly computing (and moving towards) \emph{frontiers}, the segments which separate the known regions from those unknown. However, most frontier detection algorithms process the entire map data. This can be a time consuming process which slows down the exploration. In this paper, we present two novel frontier detection algorithms: \emph{WFD}, a graph search based algorithm and \emph{FFD} , which is based on processing only the new laser readings data. In contrast to state-of-the-art methods, both algorithms do not process the entire map data. We implemented both algorithms and showed that both are faster than a state-of-the-art frontier detector implementation (by several orders of magnitude). Robot Exploration with Fast Frontier Detection: Theory and Experiments Matan Keidar, Gal Kaminkahas 3 papers

3A_4

We consider the problem of dynamic self-reconfiguration in a modular self-reconfigurable robot (MSR). Previous approaches to MSR self-reconfiguration solve this problem using algorithms that search for a goal configuration in the MSR's configuration space. In contrast, we model the self-reconfiguration problem as a constrained optimization problem that attempts to minimize the reconfiguration cost while achieving a desirable configuration. We formulate the MSR self-reconfiguration problem as finding the optimal coalition structure within a coalition game theoretic framework. To reduce the complexity of finding the optimal coalition structure, we represent the set of all robot modules as a fully-connected graph. Each robot module corresponds to a vertex of the graph and edge weights represent the utility of a pair of modules being in the same coalition (or, connected component). The value of a coalition structure is then defined as the sum of the weights of all edges that are completely within the same coalition in that coalition structure. We then use a graph partitioning technique to cluster the vertices (robot modules) in the constructed graph so that the obtained coalition structure has close to optimal value. The clustering algorithm has time complexity polynomial in the number of agents, $n$, and yields an O(log $n$) approximation. We have verified our technique experimentally for a variety of settings. Our results show that the graph clustering-based self-reconfiguration algorithm performs comparably with two other existing algorithms for determining optimal coalition structures. Dynamic Reconfiguration in Modular Robots using Graph Partitioning-based Coalitions Prithviraj Dasgupta, Vladimir Ufimtsev, Carl Nelson, S. G. M. Hossain

3A_5

This paper presents the architecture and key components of a simulated humanoid robot soccer team, UT Austin Villa, which was designed to compete in the RoboCup 3D simulation competition. These key components include (1) an omnidirectional walk engine and associated walk parameter optimization framework, (2) an inverse kinematics based kicking architecture, and (3) a dynamic role assignment and positioning system. UT Austin Villa won the RoboCup 2011 3D simulation competition in convincing fashion by winning all 24 games it played. During the course of the competition the team scored 136 goals while conceding none. We analyze the effect of each component in isolation and show through extensive experiments that the complete team significantly outperforms all the other teams from the competition. UT Austin Villa 2011: A Champion Agent in the RoboCup 3D Soccer Simulation Competition Patrick MacAlpinehas 2 papers, Daniel Urieli, Samuel Barretthas 2 papers, Shivaram Kalyanakrishnan, Francisco Barrera, Adrian Lopez-Mobilia, Nicolae Ştiurcă, Victor Vu, Peter Stonehas 5 papers

Session 3C – Human-agent Interaction

3C_1

People's cultural background has been shown to affect how they reach agreement in negotiation and how they fulfil these agreements. This paper presents a novel agent design for negotiating with people from different cultures. Our setting involved an alternating-offer protocol that allowed parties to choose the extent to which they kept each of their agreements during the negotiation. A challenge to designing agents for such setting is to predict how people reciprocate their actions over time despite the scarcity of prior data of their behavior across different cultures. Our methodology addresses this challenge by combining a decision theoretic model with classical machine learning techniques to predict how people respond to offers, and the extent to which they fulfill agreements. This agent was evaluated empirically by playing with 157 people in three countries--Lebanon, the U.S., and Israel---in which people are known to vary widely in their negotiation behavior. The agent was able to outperform people in all countries under conditions that varied how parties depended on each other at the onset of the negotiation. This is the first work to show that a computer agent can learn to outperform people when negotiating in three countries representing different cultures. A Cultural Sensitive Agent for Human-Computer Negotiation Galit Haimhas 2 papers, Ya'akov (Kobi) Gal, Sarit Kraushas 4 papers, Michele Gelfand

3C_2

We present a novel computational method for advice-generation in path selection problems which are difficult for people to solve. The advisor agent's interests may conflict with the interests of the people who receive the advice. Such optimization settings arise in many human-computer applications in which agents and people are self-interested but also share certain goals, such as automatic route-selection systems that also reason about environmental costs. This paper presents an agent that clusters people into one of several types, based on how their path selection behavior adheres to the paths presented to them by the agent who does not necessarily suggest their most preferred paths. It predicts the likelihood that people will deviate from these suggested paths and uses a decision theoretic approach to suggest paths to people which will maximize the agent's expected benefit given the people's deviations. This technique was evaluated empirically in an extensive study involving hundreds of human subjects solving the path selection problem in mazes. Results showed that the agent was able to outperform alternative methods that solely considered the benefit to the agent or the person, or did not provide any advice. Giving Advice to People in Path Selection Problems Amos Azariahas 2 papers, Zinovi Rabinovich, Sarit Kraushas 4 papers, Claudia Goldman, Omer Tsimhoni

3C_3

We show how machine learning and inference can be harnessed to leverage the complementary strengths of humans and computational agents to solve crowdsourcing tasks. We construct a set of Bayesian predictive models from data and describe how the models operate within an overall crowdsourcing architecture that combines the efforts of people and machine vision on the task of classifying celestial bodies defined within a citizens' science project named Galaxy Zoo. We show how learned probabilistic models can be used to fuse human and machine contributions and to predict the behaviors of workers. We employ multiple inferences in concert to guide decisions on hiring and routing workers to tasks so as to maximize the efficiency of large-scale crowdsourcing processes based on expected utility. Combining Human and Machine Intelligence in Large-scale Crowdsourcing Ece Kamarhas 2 papers, Severin Hacker, Eric Horvitzhas 3 papers

3C_4

As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users - without programming skills - can transfer their task knowledge to agents, learning can accelerate dramatically, reducing costly trials. The \textsc{tamer} framework guides the design of agents whose behavior can be shaped through signals of approval and disapproval, a natural form of human feedback. More recently, \textsc{tamer+rl} was introduced to enable human feedback to augment a traditional reinforcement learning (RL) agent that learns from a Markov decision process's (MDP) reward signal. We address limitations of prior work on \textsc{tamer} and \textsc{tamer+rl}, contributing in two critical directions. First, the four successful techniques for combining human reward with RL from prior \textsc{tamer+rl} work are tested on a second task, and these techniques' sensitivities to parameter changes are analyzed. Together, these examinations yield more general and prescriptive conclusions to guide others who wish to incorporate human knowledge into an RL algorithm. Second, \textsc{tamer+rl} has thus far been limited to a \emph{sequential} setting, in which training occurs before learning from MDP reward. In this paper, we introduce a novel algorithm that shares the same spirit as \textsc{tamer+rl} but learns \emph{simultaneously} from both reward sources, enabling the human feedback to come at any time during the reinforcement learning process. We call this algorithm simultaneous \textsc{tamer+rl}. To enable simultaneous learning, we introduce a new technique that appropriately determines the magnitude of the human model's influence on the RL algorithm throughout time and state-action space. Reinforcement Learning from Simultaneous Human and MDP Reward W. Bradley Knox, Peter Stonehas 5 papers

3C_5

Both Learning from Demonstration (LfD) and Reinforcement Learning (RL) are popular approaches for building decision-making agents. LfD applies supervised learning to a set of human demonstrations to infer and imitate the human policy, while RL uses only a reward signal and exploration to find an optimal policy. For complex tasks both of these techniques may be ineffective. LfD may require many more demonstrations than it is feasible to obtain, and RL can take an inadmissible amount of time to converge.
We present Automatic Decomposition and Abstraction from demonstration (ADA), an algorithm that uses mutual information measures over a set of human demonstrations to decompose a sequential decision process into several subtasks, finding state abstractions for each one of these subtasks. ADA then projects the human demonstrations into the abstracted state space to build a policy. This policy can later be improved using RL algorithms to surpass the performance of the human teacher. We find empirically that ADA can find satisficing policies for problems that are too complex to be solved with traditional LfD and RL algorithms. In particular, we show that we can use mutual information across state features to leverage human demonstrations to reduce the effects of the curse of dimensionality by finding subtasks and abstractions in sequential decision processes. Automatic Task Decomposition and State Abstraction from Demonstration Luis C. Cobo, Charles L. Isbell Jr., Andrea Thomaz

Session 3D – Economies & Markets I

3D_1

A market maker sets prices over time for wagers that pay out contingent on the future state of the world. The market maker has knowledge of the probability of realizing each state of the world, and of how the price of a bet affects the probability that traders will accept it. We compare the optimal policy for risk-neutral (expected utility maximizing) and Kelly criterion (expected log utility maximizing) market makers. Computing the optimal policy for a risk-neutral market maker is relatively simple, while computing the optimal policy for a Kelly criterion market maker is challenging, requiring advanced techniques adapted from the computational economics literature to run efficiently. We show that while a risk-neutral market maker has an optimal policy that does not depend on the market maker's state, a Kelly criterion market maker's optimal policy has an intricate dependence on both time and state. Counter-intuitively, a Kelly criterion market maker may offer bets that are myopically irrational with respect to the market maker's beliefs for the entire trading period. In contrast, a risk-neutral market maker never offers a myopically irrational bet. Rational Market Making with Probabilistic Knowledge Abraham Othman, Tuomas Sandholmhas 4 papers

3D_2

Many agent-based models of financial markets have been able to reproduce certain stylized facts that are observed in actual empirical time series data by using "zero-intelligence" agents whose behaviour is largely random in order to ascertain whether certain phenomena arise from market micro-structure as opposed to strategic behaviour. Although these models have been highly successful, it is not surprising that they are unable to explain \emph{every} stylized fact, and indeed it seems plausible that although some phenomena arise purely from market micro-structure, other phenomena arise from the behaviour of the participating agents, as suggested by more complex agent-based models which use agents endowed with various forms of strategic behaviour. Given that both zero-intelligence and strategic models are each able to explain various phenomena, an interesting question is whether there are hybrid, "zero-intelligence plus" models containing a minimal amount of strategic behaviour that are simultaneously able to explain all of the stylized facts. We conjecture that as we gradually increase the level of strategic behaviour in a zero-intelligence model of a financial market we will obtain an increasingly good fit with the stylized facts of empirical financial time-series data. We test this hypothesis by systematically evaluating several different experimental treatments in which we incrementally add minimalist levels of strategic behaviour to our model, and test the resulting time series of returns for the following statistical features: fat tails, volatility clustering, persistence and non-Gaussianity. Surprisingly, the resulting "zero-intelligence plus" models do mph{not} introduce more realism to the time series, thus supporting other research which conjectures that some phenomena in the financial markets are indeed the result of more sophisticated learning, interaction and adaptation. Can a Zero-Intelligence Plus Model Explain the Stylized Facts of Financial Time Series Data? Imon Palit, Steve Phelps, Wing Lon Ng

3D_3

This paper presents a novel scoring rule-based strictly dominant incentive compatible mechanism that encourages agents to produce costly estimates of future events and report them truthfully to a centre. Whereas prior work has assumed a fixed budget for payment towards agents, this work makes use of prior information held by the centre and assumes a budget that is determined by the savings made through the use of the agents' information over the centre's own prior information. This mechanism is compared to a simple benchmark mechanism wherein the savings are divided equally among all home agents, and a cooperative solution wherein agents act to maximise social welfare. Empirical analysis is performed in which the mechanism is applied to a simulation of the smart grid whereby an aggregator agent must use home agents' information to optimally purchase electricity. It is shown that this mechanism achieves up to 77% of the social welfare achieved by the cooperative solution. A Scoring Rule-based Mechanism for Aggregate Demand Prediction in the Smart Grid Harry Rose, Alex Rogershas 9 papers, Enrico Gerdinghas 3 papers

3D_4

We introduce a novel online mechanism that schedules the allocation of an expiring and continuously-produced resource to self-interested agents with private preferences. A key application of our mechanism is the charging of pure electric vehicles, where owners arrive dynamically over time, and each owner requires a minimum amount of charge by its departure to complete its next trip. To truthfully elicit the agents' preferences in this setting, we introduce the new concept of pre-commitment: Whenever an agent is selected, our mechanism pre-commits to charging the vehicle by its reported departure time, but maintains flexibility about \emph{when} the charging takes place and at \emph{what rate}. Furthermore, to make effective allocation decisions we use a model-based approach by modifying Consensus, a well-known online optimisation algorithm. We show that our pre-commitment mechanism with modified Consensus incentivises truthful reporting. Furthermore, through simulations based on real-world data, we show empirically that the average utility achieved by our mechanism is 93% or more of the offline optimal. A Model-Based Online Mechanism with Pre-Commitment and its Application to Electric Vehicle Charging Sebastian Steinhas 2 papers, Enrico Gerdinghas 3 papers, Valentin Robuhas 2 papers, Nick Jenningshas 11 papers

3D_5

A principal seeks production of a good within a limited time-frame with a hard deadline, after which any good procured has no value. There is inherent uncertainty in the production process, which in light of the deadline may warrant simultaneous production of multiple goods by multiple producers despite there being no marginal value for extra goods beyond the \emph{maximum quality} good produced. This motivates a \emph{crowdsourcing} model of procurement. We address efficient execution of such procurement from a social planner's perspective, taking account of and optimally balancing the value to the principal with the costs to producers (modeled as effort expenditure) while, crucially, contending with self-interest on the part of all players. A solution to this problem involves both an algorithmic aspect that determines an optimal effort level for each producer given the principal's value, and also an incentive mechanism that equilibrium implementation of the socially optimal policy despite the principal privately observing his value, producers privately observing their skill levels and effort expenditure, and all acting selfishly to maximize their own individual welfare. In contrast to popular "winner take all" contests, the efficient mechanism we propose involves a payment to every producer that expends non-zero effort in the efficient policy. Efficient Crowdsourcing Contests Ruggiero Cavallo, Shaili Jain

Session 3E – Game Theory III

3E_1

To step beyond the first-generation deployments of attacker-defender security games -- for LAX Police, US FAMS and others -- it is critical that we relax the assumption of perfect rationality of the human adversary. Indeed, this assumption is a well-accepted limitation of classical game theory and modeling human adversaries' bounded rationality is critical. To this end, quantal response (QR) has provided very promising results to model human bounded rationality. However, in computing optimal defender strategies in real-world security games against a QR model of attackers, we face difficulties including (1) solving a nonlinear non-convex optimization problem efficiently for massive real-world security games; and (2) addressing constraints on assigning security resources, which adds to the complexity of computing the optimal defender strategy.
This paper presents two new algorithms to address these difficulties: \textsc{GOSAQ} can compute the globally optimal defender strategy against a QR model of attackers when there are no resource constraints and gives an efficient heuristic otherwise; \textsc{PASAQ} in turn provides an efficient approximation of the optimal defender strategy with or without resource constraints. These two novel algorithms are based on three key ideas: (i) use of a binary search method to solve the fractional optimization problem efficiently, (ii) construction of a convex optimization problem through a non-linear transformation, (iii) building a piecewise linear approximation of the non-linear terms in the problem. Additional contributions of this paper include proofs of approximation bounds, detailed experimental results showing the advantages of extsc{GOSAQ} and \textsc{PASAQ} in solution quality over the benchmark algorithm (\textsc{BRQR}) and the efficiency of \textsc{PASAQ}. Given these results, \textsc{PASAQ} is at the heart of the PROTECT system, which is deployed for the US Coast Guard in the port of Boston, and is now headed to other ports. Computing Optimal Strategy against Quantal Response in Security Games Rong Yanghas 5 papers, Fernando Ordóñezhas 2 papers, Milind Tambehas 13 papers

3E_2

Given their existing and potential real-world security applications, Bayesian Stackelberg games have received significant research interest. In these games, the defender acts as a leader, and the many different follower types model the uncertainty over discrete attacker types. Unfortunately since solving such games is an NP-hard problem, scale-up has remained a difficult challenge.
This paper scales up Bayesian Stackelberg games, providing a novel unified approach to handle uncertainty not only over discrete follower types but also other key continuously distributed real world uncertainty, due to the leader's execution error, the follower's observation error, and continuous payoff uncertainty. To that end, this paper provides contributions in two parts. First, we present a new algorithm for Bayesian Stackelberg games, called \textsc{hunter}, to scale up the number of types. \textsc{hunter} combines the following five key features: i) efficient pruning via a best-first search of the leader's strategy space; ii) a novel linear program for computing tight upper bounds for this search; iii) using Bender's decomposition for solving the upper bound linear program efficiently; iv) efficient inheritance of Bender's cuts from parent to child; v) an efficient heuristic branching rule. Our experiments show that \textsc{hunter} provides orders of magnitude speedups over the best existing methods to handle discrete follower types. In the second part, we show \textsc{hunter}'s efficiency for Bayesian Stackelberg games can be exploited to also handle the continuous uncertainty using sample average approximation. We experimentally show that our \textsc{hunter}based approach also outperforms latest robust solution methods under continuously distributed uncertainty. A Unified Method for Handling Discrete and Continuous Uncertainty in Bayesian Stackelberg Games Zhengyu Yinhas 4 papers, Milind Tambehas 13 papers

3E_3

The burgeoning area of security games has focused on real-world domains where security agencies protect critical infrastructure from a diverse set of adaptive adversaries. There are security domains where the payoffs for preventing the different types of adversaries may take different forms (seized money, reduced crime, saved lives, etc) which are not readily comparable. Thus, it can be difficult to know how to weigh the different payoffs when deciding on a security strategy. To address the challenges of these domains, we propose a fundamentally different solution concept, multi-objective security games (MOSG), which combines security games and multi-objective optimization. Instead of a single optimal solution, MOSGs have a set of Pareto optimal (non-dominated) solutions referred to as the Pareto frontier. The Pareto frontier can be generated by solving a sequence of constrained single-objective optimization problems (CSOP), where one objective is selected to be maximized while lower bounds are specified for the other objectives. Our contributions include: (i) an algorithm, Iterative $\epsilon$-Constraints, for generating the sequence of CSOPs; (ii) an exact approach for solving an MILP formulation of a CSOP (which also applies to multi-objective optimization in more general Stackelberg games); (iii) heuristics that achieve speedup by exploiting the structure of security games to further constrain a CSOP; (iv) an approximate approach for solving an algorithmic formulation of a CSOP, increasing the scalability of our approach with quality guarantees. Additional contributions of this paper include proofs on the level of approximation and detailed experimental evaluation of the proposed approaches. Multi-Objective Optimization for Security Games Matthew Brown, Bo Anhas 3 papers, Christopher Kiekintveld, Fernando Ordóñezhas 2 papers, Milind Tambehas 13 papers

3E_4

There has been significant recent interest in computing effective strategies for playing large imperfect-information games. Much prior work involves computing an approximate equilibrium strategy in a smaller abstract game, then playing this strategy in the full game (with the hope that it also well approximates an equilibrium in the full game). In this paper, we present a family of modifications to this approach that work by constructing non-equilibrium strategies in the abstract game, which are then played in the full game. Our new procedures, called \emph{purification} and \emph{thresholding}, modify the action probabilities of an abstract equilibrium by preferring the higher-probability actions. Using a variety of domains, we show that these approaches lead to significantly stronger play than the standard equilibrium approach. As one example, our program that uses purification came in first place in the two-player no-limit Texas Hold'em total bankroll division of the 2010 Annual Computer Poker Competition. Surprisingly, we also show that purification significantly improves performance (against the full equilibrium strategy) in random 4 x 4 matrix games using random 3 x 3 abstractions. We present several additional results (both theoretical and empirical). Overall, one can view these approaches as ways of achieving robustness against overfitting one's strategy to one's lossy abstraction. Perhaps surprisingly, the performance gains do not necessarily come at the expense of worst-case exploitability. Strategy Purification and Thresholding: Effective Non-Equilibrium Approaches for Playing Large Games Sam Ganzfried, Tuomas Sandholmhas 4 papers, Kevin Waugh

3E_5

Moving assets through a transportation network is a crucial challenge in hostile environments such as future battlefields where malicious adversaries have strong incentives to attack vulnerable patrols and supply convoys. Intelligent agents must balance network costs with the harm that can be inflicted by adversaries who are in turn acting rationally to maximize harm while trading off against their own costs to attack. Furthermore, agents must choose their strategies even without full knowledge of their adversaries' capabilities, costs, or incentives.
In this paper we model this problem as a non-zero sum game between two players, a sender who chooses flows through the network and an adversary who chooses attacks on the network. We advance the state of the art by: (1) moving beyond the zero-sum games previously considered to non-zero sum games where the adversary incurs attack costs that are not incorporated into the payoff of the sender; (2) introducing a refinement of the Stackelberg equilibrium that is more appropriate to network security games than previous solution concepts; and (3) using Bayesian games where the sender is uncertain of the capabilities, payoffs, and costs of the adversary. We provide polynomial time algorithms for finding equilibria in each of these cases. We also show how our approach can be applied to games where there are multiple adversaries. Solving Non-Zero Sum Multiagent Network Flow Security Games with Attack Costs Steven Okamoto, Noam Hazon, Katia Sycarahas 4 papers

Session 3F – Agent-based Software Development

3F_1

In Belief Desire Intention (BDI) agent systems it is usual for goals to have a number of plans that are possible ways of achieving the goal, applicable in different situations, usually captured by a \emph{context condition}. In Agent Oriented Software Engineering it has been suggested that a designer should be conscious of whether a goal has \emph{complete coverage}, that is, is there some plan that is applicable for every situation. Similarly a designer should be conscious of \emph{overlap}, that is, for a given goal, are there situations where more than one plan could be applicable for achieving that goal. In this paper we further develop these notions in two ways, and then describe how they can be used both in agent reasoning and agent system development. Firstly, we replace the boolean value for basic coverage and overlap with numerical measures, and explain how these may be calculated. Secondly, we describe a measure that combines these basic measures, with the characteristics of the coverage/overlap in the goal-plan tree below a given goal. We then describe how these domain independent mesures can be used for both plan selection and intention selection, as well as for guidance in agent system design and development. Measuring Plan Coverage and Overlap for Agent Reasoning John Thangarajahhas 4 papers, Sebastian Sardinahas 2 papers, Lin Padghamhas 4 papers

3F_2

Normative organisations provide a means to coordinate the activities of individual agents in multiagent settings. The coordination is realized at run time by creating obligations and prohibitions (norms) for individual agents. If an agent cannot meet an obligation or violates a prohibition, the organisation imposes a sanction on the agent. In this paper, we consider \emph{norm-aware} agents that deliberate on their goals, norms and sanctions before deciding which plan to select and execute. A norm-aware agent is able to violate norms (accepting the resulting sanctions) if it is in the agent's overall interests to do so, e.g., if meeting an obligation would result in an important goal of the agent becoming unachievable. Programming norm-aware agents in conventional BDI-based agent programming languages is difficult, as they lack support for deliberating about goals, norms and sanctions and deadlines. We present the norm-aware agent programming language N-2APL. N-2APL is based on 2APL and provides support for beliefs, goals, plans, norms, sanctions and deadlines. We give the syntax and semantics of N-2APL, and show that N-2APL agents are rational in the sense of committing to a set of plans that will achieve the agent's most important goals and obligations by their deadlines while respecting its most important prohibitions. Programming Norm-Aware Agents Natasha Alechinahas 3 papers, Mehdi Dastanihas 4 papers, Brian Loganhas 2 papers

3F_3

A great number of methodologies has been already introduced in the agent-oriented software engineering field. Recently many of the authors of these methodologies also worked on their fragmentation thus obtaining portions (often called method or process fragments) that may be composed into new methodologies. The great advancement in this field, however does not correspond to equivalent results in the evaluation of the methodologies and their fragments. It is, for instance, difficult to select a fragment in the composition of a new methodology and to predict the methodology's resulting features. This work introduces a suite of metrics for evaluating and comparing entire methodologies but also their composing fragments. The proposed metrics are based on the multi-agent system metamodel. The metrics have been applied to the ADELFE and PASSI methodologies, results prove the usefulness of the proposed approach and encourage further studies on the matter. Metamodel-Based Metrics for Agent-Oriented Methodologies Noélie Bonjean, Antonio Chella, Massimo Cossentino, Marie-Pierre Gleizes, Frédéric Migeon, Valeria Seidita

3F_4

We introduce Comma, a methodology for developing cross-organizational business models. Comma gives prime position to patterns of business relationships understood in terms of commitments. In this manner, it contrasts with traditional operational approaches such as RosettaNet that are commonly used in industry.

We report the results of a developer study comparing Comma with a methodology recommended by the RosettaNet Consortium. Ours is one of the only evaluations of an agent-oriented methodology that (1) involves developers other than the proposing researchers and (2) compares against a traditional nonagent approach.

We found that Comma yields improved model quality, a greater focus in relative effort on the more important aspects of modeling, and a general reduction in total time despite yielding more comprehensive models. Certain anomalies in effort expended point toward the need for improved tooling. Comma: A Commitment-Based Business Modeling Methodology and its Empirical Evaluation Pankaj Telanghas 3 papers, Munindar Singhhas 3 papers

3F_5

Autonomous agents typically have several goals they are pursuing simultaneously. Even if the goals themselves are not necessarily inconsistent, choices made about how to pursue each of these goals may well result in a set of intentions which are conflicting. A rational autonomous agent should be able to reason about and modify its set of intentions to take account of such issues. This paper presents the semantics of some preferences regarding modified sets of intentions. We look at the possibility of simply deleting some intention(s) but more importantly we also look at the possibility of modifying intentions, such that the goals will still be achieved but in a different way. Revising Conflicting Intention Sets in BDI Agents Steven Shapiro, Sebastian Sardinahas 2 papers, John Thangarajahhas 4 papers, Lawrence Cavedonhas 2 papers, Lin Padghamhas 4 papers

Session 4A – Robotics II

4A_1

In this paper, we propose a novel top-down design method for the development ofcollective behaviors of swarm robotics systems called \emph{property-driven design}. Swarm robotics systems are usually designed and developed using a \emph{code-and-fix} approach, that is, the developer devises, tests and modifies the individual robot behaviors until a desired collective behavior is obtained. The code-and-fix approach can be very time consuming and relies completely on the ingenuity and expertise of the designer. The idea of property-driven design is that a swarm robotics system can be described by specifying formally a set of desired properties. In an iterative process similar to test-driven development, the developer produces a model of the system that satisfies the desired properties. Subsequently, the system is implemented in simulation and using real robots. Property-driven design helps to minimize the risk of developing a system that does not satisfy the required properties, and to promote the reuse of hardware-independent models. In this paper, we start by giving a general description of the method. We then present a possible way to apply it by using Discrete Time Markov Chains (DTMC) and Probabilistic Computation Tree Logic* (PCTL*). Finally, we conclude by presenting the application of the proposed method to the design and development of a swarm robotics system performing aggregation. Property-driven design for swarm robotics Manuele Brambilla, Carlo Pinciroli, Mauro Birattari, Marco Dorigohas 2 papers

4A_2

This paper describes a multi-robot collision avoidance system based on the velocity obstacle paradigm. In contrast to previous approaches, we alleviate the strong requirement for perfect sensing (i.e. global positioning) using Adaptive Monte-Carlo Localization on a per-agent level. While such methods as Optimal Reciprocal Collision Avoidance guarantee local collision-free motion for a large number of robots, given perfect knowledge of positions and speeds, a realistic implementation requires further extensions to deal with inaccurate localization and message passing delays. The presented algorithm bounds the error introduced by localization and combines the computation for collision-free motion with localization uncertainty. We provide an open source implementation using the Robot Operating System (ROS). The system is tested and evaluated with up to eight robots in simulation and on four differential drive robots in a real-world situation. Multi-robot collision avoidance with localization uncertainty Daniel Henneshas 3 papers, Daniel Claeshas 2 papers, Wim Meeussenhas 2 papers, Karl Tuylshas 5 papers

4A_3

This paper presents a novel decision-theoretic approach to control and coordinate multiple active cameras for observing a number of moving targets in a surveillance system. This approach offers the advantages of being able to (a) account for the stochasticity of targets' motion via probabilistic modeling, and (b) address the trade-off between maximizing the expected number of observed targets and the resolution of the observed targets through stochastic optimization. One of the key issues faced by existing approaches in multi-camera surveillance is that of scalability with increasing number of targets. We show how its scalability can be improved by exploiting the problem structure: as proven analytically, our decision-theoretic approach incurs time that is linear in the number of targets to be observed during surveillance. As demonstrated empirically through simulations, our proposed approach can achieve high-quality surveillance of up to 50 targets in real time and its surveillance performance degrades gracefully with increasing number of targets. We also demonstrate our proposed approach with real AXIS 214 PTZ cameras in maximizing the number of Lego robots observed at high resolution over a surveyed rectangular area. The results are promising and clearly show the feasibility of our decision-theoretic approach in controlling and coordinating the active cameras in real surveillance system. Decision-Theoretic Approach to Maximizing Observation of Multiple Targets in Multi-Camera Surveillance Prabhu Natarajanhas 2 papers, Trong Nghia Hoanghas 2 papers, Kian Hsiang Lowhas 3 papers, Mohan Kankanhalli

4A_4

When a mixture of particles with different attributes undergoes vibration, a segregation pattern is often observed. For example, in muesli cereal packs, the largest particles - the Brazil nuts - tend to end up at the top. For this reason, the phenomenon is known as the Brazil nut effect. In previous research, an algorithm inspired by this effect was designed to produce segregation patterns in swarms of simulated agents that move on a horizontal plane.
In this paper, we adapt this algorithm for implementation on robots with directional vision. We use the e-puck robot as a platform to test our implementation. In a swarm of e-pucks, different robots mimic disks of different sizes (larger than their physical dimensions). The motion of every robot is governed by a combination of three components: (i) attraction towards a point, which emulates the effect of a gravitational pull, (ii) random motion, which emulates the effect of vibration, and (iii) repulsion from nearby robots, which emulates the effect of collisions between disks. The algorithm does not require robots to discriminate between other robots; yet, it is capable of forming annular structures where the robots in each annulus represent disks of identical size.
We report on a set of experiments performed with a group of 20 physical e-pucks. The results obtained in 100 trials of 20 minutes each show that the percentage of incorrectly-ordered pairs of disks from different groups decreases as the size ratio of disks in different groups is increased. In our experiments, this percentage was, on average, below 0.5% for size ratios from 3.0 to 5.0. Moreover, for these size ratios, all segregation errors observed were due to mechanical failures that caused robots to stop moving. Segregation in Swarms of e-puck Robots Based On the Brazil Nut Effect Jianing Chenhas 2 papers, Melvin Gauci, Michael J. Price, Roderich Groß

4A_5

Modern model-driven engineering and Agent-Oriented Software Engineering (AOSE) methods are rarely utilized in developing robotic software. In this paper, we show how a Model-Driven AOSE methodology can be used for specifying the behavior of multi-robot teams. Specifically, the Agent Systems Engineering Methodology (ASEME) was used for developing the software that realizes the behavior of a physical robot team competing in the Standard Platform League of the RoboCup competition (the robot soccer world cup). The team consists of four humanoid robots, which play soccer autonomously in real time utilizing the on-board sensing, processing, and actuating capabilities, while communicating and coordinating with each other in order to achieve their common goal of winning the game. Our work focuses on the challenges of coordinating the base functionalities (object recognition, localization, motion skills) within each robot (intra-agent control) and coordinating the activities of the robots towards a desired team behavior (inter-agent control). We discuss the difficulties we faced and present the solutions we gave to a number of practical issues, which, in our view, are inherent in applying any AOSE methodology to robotics. We demonstrate the added value of using an AOSE methodology in the development of robotic systems, as ASEME allowed for a platform-independent team behavior specification, automated a large part of the code generation process, and reduced the total development time. Model-Driven Behavior Specification for Robotic Teams Alexandros Paraschos, Nikolaos Spanoudakis, Michail Lagoudakis

Session 4B – Agent Societies

4B_1

In an Evolutionary Algorithm (EA) for optimization problems, candidate solutions to the problems are individuals in a population. They produce offsprings by taking evolutionary operators with user-specific control parameters. The challenge is then how to effectively select evolutionary operators and adjust control parameters from generation to generation and on different problems. We propose a novel multiagent evolutionary framework based on trust where each solution is represented as an intelligent agent, and evolutionary operators and control parameters are represented as services. Agents select services in each generation based on trust that measures the competency or suitability of the services for solving particular problems. Multiobjective Optimization Problems (MOPs) are used to showcase the value of our framework. Experimental studies on 35 benchmark MOPs show that our framework significantly improves the performance of the state-of-the-art EAs. A Multiagent Evolutionary Framework based on Trust for Multiobjective Optimization Siwei Jianghas 2 papers, Jie Zhanghas 2 papers, Yew-Soon Onghas 2 papers

4B_2

We propose a novel method for assessing the reputation of agents in multiagent systems that is capable of exploiting the structure and semantics of rich agent interaction protocols and agent communication languages. Our method is based on using so-called \emph{conversation models}, i.e. succinct, qualitative models of agents' behaviours derived from the application of data mining techniques on protocol execution data in a way that takes advantage of the semantics of inter-agent communication available in many multiagent systems. Contrary to existing systems, which only allow for querying agents regarding their assessment of others' reputation in an \emph{outcome-based} way (often limited to distinguishing between "successful" and "unsuccessful" interactions), our method allows for contextualised queries regarding the structure of past interactions, the values of content variables, and the behaviour of agents across different protocols. Moreover, this is achieved while preserving maximum privacy for the reputation querying agent and the witnesses queried, and without requiring a common definition of reputation, trust or reliability among the agents exchanging reputation information. A case study shows that, even with relatively simple reputation measures, our qualitative method outperforms quantitative approaches, proving that we can meaningfully exploit the additional information afforded by rich interaction protocols and agent communication semantics. A qualitative reputation system for multiagent systems with protocol-based communication Emilio Serranohas 2 papers, Michael Rovatsos, Juan Botia

4B_3

Several reputation models have been introduced to deal with the problem of biased reputation providers. Most of these models discount or discard biased information received from the reputation providers, and most of them are not appropriate when a large population of information providers are biased or dishonest. In this paper, we present a probabilistic approach for reputation modeling, the Probabilistic Reputation model (PRep). PRep models a reputation provider's behavior, and uses this model to re-interpret the reported information, thus making use of the entire reputation reports effectively, even if they are biased. The re-interpreted data is combined with the agent's direct experiences to determine an overall level of trust in the third-party agent. We show that PRep significantly outperforms two state-of-the-art trust and reputation models - HAPTIC and TRAVOS - and improves the overall payoff in a game-theoretic environment. PRep: A Probabilistic Reputation Model for Biased Societies Yasaman Haghpanah, Marie desJardins

4B_4

Despite a large body of research on integrating organizational concepts into cooperative multiagent systems, a formal understanding of how organizations can influence agents' decisions remains elusive. This paper works toward such an understanding by beginning with a model of agent decision making based on decision-theoretic principles, and then examining the possible routes that organizational influences can take to affect that model. We show that alternative avenues of applying influences correspond to different prior notions of organizational control, and empirically demonstrate the impact that each can have on the quality and overhead of coordinated behavior. To do so, we must define the agents' baseline behavior (without a designed organization), and we present a methodology for initializing agents' models to comprise what amounts to an "uninformed" organization. Finally, we show how the specification of organizational influences in terms of components of a decision-theoretic agent creates opportunities for agents to compare actual events with predictions implied in the models, such that agents can reason about whether to change organizations. We demonstrate that this capability to question and change organizations can be valuable if used judiciously. A Decision-Theoretic Characterization of Organizational Influences Jason Sleight, Ed Durfeehas 2 papers

4B_5

The use of norms in multiagent systems has proven to be a successful approach in order to coordinate and regulate the behaviour of participating agents. In such normative systems it is generally assumed that agents can obey or disobey norms. In this paper, we develop a logical framework for normative systems that allows reasoning about agents' abilities under a multitude of norm compliance assumptions. In particular, we investigate different types of norm compliance and propose an extension of Alternating Temporal Logic (ATL) to reason about the abilities of (coalitions of) agents under different types of norm compliance assumptions. For this extension we show that the problem of model-checking remains close to the domain of standard ATL. Finally, we show that some norms can limit an agent's autonomy in the sense that an agent cannot control the violation of these norms. We present and discuss various classes of the so-called self-supporting norms, i.e., norms for which individual agents have control over their violations Reasoning under Compliance Assumptions in Normative Multiagent Systems Max Knobbout, Mehdi Dastanihas 4 papers

Session 4C – Argumentation & Negotiation

4C_1

An argumentation framework can be seen as expressing, in an abstract way, the conflicting information of an underlying logical knowledge base. This conflicting information often allows for the presence of more than one possible reasonable position (extension/labelling) which one can take. A relevant question, therefore, is how much these positions differ from each other. In the current paper, we will examine the issue of how to define meaningful measures of distance between the (complete) labellings of a given argumentation framework. We provide concrete distance measures based on argument-wise label difference, as well as based on the notion of critical sets, and examine their properties. Quantifying Disagreement in Argument-based Reasoning Richard Booth, Martin Caminada, Mikolaj Mikołaj, Iyad Rahwanhas 2 papers

4C_2

We introduce an approach to cooperative dialogues as a framework for group deliberation. One of its distinguishing features is that it deals with conditional and constraint-based arguments, which are built by employing abductive and hypothetical reasoning. These kinds of arguments allow agents to use a variety of dialogue moves proper to a cooperative debate, such as argument rewrites and conditional attacks. In our approach, a group of agents develops a dialogue as they explore different lines of thought to build a group position in a yes or no decision. In essence, given a matter for discussion, the parties involved will consider arguments that either supports or rejects it and discuss such arguments to decide whether or not to accept them. To achieve that, agents will work as a team and combine their knowledge to produce more complex arguments and study possible flaws these might have. Cooperative Dialogues with Conditional Arguments Samy Sá, João Alcântara

4C_3

This contribution presents a practical extension of a theoretical model for multi-agent planning based upon DeLP, an argumentation-based defeasible logic. Our framework, named DeLP-MAPOP, is implemented on a platform for open multi-agent systems and has been experimentally tested, among others, in applications of ambient intelligence in the field of health-care. DeLP-MAPOP is based on a multi-agent partial order planning paradigm in which agents have diverse abilities, use an argumentation-based defeasible reasoning to support their own beliefs and refute the beliefs of the others according to their knowledge during the plan search process. The requirements of Ambient Intelligence (AmI) environments featured by the imperfect nature of the context information and heterogeneity of the involved agents make defeasible argumentation be an ideal approach to resolve potential conflicts caused by the contradictory information coming from the ambient agents. Moreover, the ability of AmI systems to build a course of action to achieve the user's needs is also a claiming capability in such systems. DeLP-MAPOP shows to be an adequate approach to tackle AmI problems as it gathers together in a single framework the ability of planning while it allows agents to put forward arguments that support or argue upon the accuracy, unambiguity and reliability of the context-aware information. Defeasible Argumentation for Multi-Agent Planning in Ambient Intelligence Applications Sergio Pajares Ferrando, Eva Onaindia

4C_4

Agents in open multi-agent systems must deal with the difficult problem of selecting interaction partners in the face of uncertainty about their behaviour. This is especially problematic if they have to interact with an agent they have not interacted with before. In this case they can turn to their peers for information about this potential partner. However, in scenarios where agents may be evaluated according to many different criteria for many different purposes, their peers' evaluations may be mismatched with regards to their own expectations. In this paper we present a novel method, using an argumentation framework, that allows agents to discuss and adapt their trust model. This allows agents to provide, and receive, personalized trust evaluations, better suited to the agent in need, as is shown in a prototypical experiment. Personalizing Communication about Trust Andrew Koster, Jordi Sabater-Mirhas 2 papers, Marco Schorlemmer

4C_5

In this paper, we introduce axiomatic and strategic models for bargaining and investigate the link between the two. Bargaining situations are described in propositional logic while the agents' preferences over the outcomes are expressed as ordinal preferences. Our main contribution is an axiomatic theory of bargaining. We propose a bargaining solution based on the well-known egalitarian social welfare for bargaining problems in which the agents' logical beliefs specify their bottom lines. We prove that the proposed solution is uniquely identified by a set of axioms. We further present a model of bargaining based on argumentation frameworks with the view to develop a strategic model of bargaining using the concept of minimal concession strategy in argument-based negotiation frameworks. From axiomatic to strategic models of bargaining with logical beliefs and goals Bao Vo, Minyi Li

Session 4D – Economies & Markets II

4D_1

The question of how to influence people in a large social system is a perennial problem in marketing, politics, and publishing. It differs from more personal inter-agent interactions that occur in negotiation and argumentation since network structure and group membership often pay a more significant role than the content of what is being said, making the messenger more important than the message. In this paper, we propose a new method for propagating information through a social system and demonstrate how it can be used to develop a product advertisement strategy in a simulated market. We consider the desire of agents toward purchasing an item as a random variable and solve the influence maximization problem in steady state using an optimization method to assign the advertisement of available products to appropriate messenger agents. Our market simulation accounts for the 1) effects of group membership on agent attitudes 2) has a network structure that is similar to realistic human systems 3) models inter-product preference correlations that can be learned from market data. The results show that our method is significantly better than network analysis methods based on centrality measures.
Identifying Influential Agents for Advertising in Multi-agent Markets Mahsa Maghami, Gita Sukthankarhas 2 papers

4D_2

We consider a setting in which a worker and a manager may each have information about the likely completion time of a task, and the worker also affects the completion time by choosing a level of effort. The task itself may further be composed of a set of subtasks, and the worker can also decide how many of these subtasks to split out into an explicit prediction task. In addition, a worker can learn about the likely completion time of a task as work on subtasks completes. We characterize a family of scoring rules for the worker and manager that provide three properties: information is truthfully reported, best effort is exerted by the worker in completing tasks as quickly as possible; and collusion is not possible. We also study the factors influencing when a worker will split a task into subtasks, each forming a separate prediction target. Predicting Your Own Effort David F. Bacon, Yiling Chenhas 2 papers, Ian Kash, David Parkeshas 2 papers, Malvika Rao, Manu Sridharan

4D_3

We consider the problem of devising incentive strategies for viral marketing of a product. In particular, we assume that the seller can influence penetration of the product by offering two programs: a) direct incentives to potential buyers (\emph{influence}) and b) referral rewards for customers who influence potential buyers to make the purchase (\emph{exploit connections}). The problem is to determine the optimal timing of these programs over a finite time horizon. In contrast to algorithmic perspective popular in the literature, we take a mean-field approach and formulate the problem as a continuous-time deterministic optimal control problem. We show that the optimal strategy for the seller has a simple structure and can take both forms, namely, \emph{influence-and-exploit} and \emph{exploit-and-influence}. We also show that in some cases it may optimal for the seller to deploy incentive programs mostly for low degree nodes. We support our theoretical results through numerical studies and provide practical insights by analyzing various scenarios. Optimal Incentive Timing Strategies for Product Marketing on Social Networks Pankaj Dayama, Aditya Karnik, Yadati Narahari

4D_4

Kidney exchange, where needy patients swap incompatible donors with each other, offers a lifesaving alternative to waiting for an organ from the deceased-donor waiting list. Recently, \emph{chains} -- sequences of transplants initiated by an altruistic kidney donor -- have shown marked success in practice, yet remain poorly understood. We provide a theoretical analysis of the efficacy of chains in the most widely used kidney exchange model, proving that long chains do not help beyond chains of length of 3 in the large. This completely contradicts our real-world results gathered from the budding nationwide kidney exchange in the United States; there, solution quality improves by increasing the chain length cap to 13 or beyond. We analyze reasons for this gulf between theory and practice, motivated by our experiences running the only nationwide kidney exchange. We augment the standard kidney exchange model to include a variety of real-world features. Experiments in the static setting support the theory and help determine how large is really "in the large". Experiments in the dynamic setting cannot be conducted in the large due to computational limitations, but with up to 460 candidates, a chain cap of 4 was best (in fact, better than 5). Optimizing Kidney Exchange with Transplant Chains: Theory and Reality John Dickerson, Ariel Procaccia, Tuomas Sandholmhas 4 papers

4D_5

We consider the age-old problem of allocating items among different agents in a way that is efficient and fair. Two papers, by Dolev et al. and Ghodsi et al., have recently studied this problem in the context of computer systems. Both papers had similar models for agent preferences, but advocated different notions of fairness. We formalize both fairness notions in economic terms, extending them to apply to a larger family of utilities. Noting that in settings with such utilities efficiency is easily achieved in multiple ways, we study notions of fairness as criteria for choosing between different efficient allocations. Our technical results are algorithms for finding fair allocations corresponding to two fairness notions: Regarding the notion suggested by Ghodsi et al., we present a polynomialtime algorithm that computes an allocation for a general class of fairness notions, in which their notion is included. For the other, suggested by Dolev et al., we show that a competitive market equilibrium achieves the desired notion of fairness, thereby obtaining a polynomial-time algorithm that computes such a fair allocation and solving the main open problem raised by Dolev et al. Fair Allocation Without Trade Avital Gutman, Noam Nisan

Session 4E – Game Theory IV

4E_1

We describe methods for routing a prediction task on a network where each participant can contribute information and route the task onwards. \emph{Routing scoring rules} bring truthful contribution of information about the task and optimal routing of the task into a Perfect Bayesian Equilibrium under common knowledge about the competancies of agents. Relaxing the common knowledge assumption, we address the challenge of routing in situations where each agent's knowledge about other agents is limited to a local neighborhood.
A family of \emph{local routing rules} isolate in equilibrium routing decisions that depend only on this local knowledge, and are the only routing scoring rules with this property. Simulation results show that local routing rules can promote effective task routing. Task Routing for Prediction Tasks Haoqi Zhang, Eric Horvitzhas 3 papers, Yiling Chenhas 2 papers, David Parkeshas 2 papers

4E_2

We consider multi-player games, and the guarantees that a master player that plays on behalf of a set of players can offer them, without making any assumptions on the rationality of the other players. Our model consists of an $(n+1)$-player game, with $m$ strategies per player, in which a \emph{master} player $M$ forms a coalition with nontransferable utilities among $n$ players, and the remaining player is called the {\em independent} player. Existentially, it is shown that every game admits a \emph{product-minimax-safe} strategy for $M$ -- a strategy that guarantees for every player in $M$'s coalition an expected value of at least her \emph{product minimax value} (which is at least as high as her minimax value and is often higher). Algorithmically, for any given vector of values for the players, one can decide in polytime whether it can be ensured by $M$, and if so, compute a mixed strategy that guarantees it. In symmetric games, a product minimax strategy for $M$ can be computed efficiently, even without being given the safety vector. We also consider the performance guarantees that $M$ can offer his players in repeated settings. Our main result here is the extension of the oblivious setting of Feldman, Kalai and Tennenholtz, showing that in every symmetric game, a master player who never observes a single payoff can guarantee for each of its players a {\em similar} performance to that of the independent player, even if the latter gets to choose the payoff matrix after the fact. Mastering multi-player games Yossi Azar, Uriel Feige, Michal Feldmanhas 2 papers, Moshe Tennenholtzhas 2 papers

4E_3

We study the problem of optimal resource allocation for packet selection and inspection to detect potential threats in large computer networks with multiple computers of differing importance. An attacker tries to harm these targets by sending malicious packets from multiple entry points of the network; the defender thus needs to optimally allocate her resources to maximize the probability of malicious packet detection under network latency constraints.
We formulate the problem as a graph-based security game with multiple resources of heterogeneous capabilities and propose a mathematical program for finding optimal solutions. We also propose \textsc{Grande}, a novel polynomial time algorithm that uses an approximated utility function to circumvent the limited scalability caused by the attacker's large strategy space and the non-linearity of the aforementioned mathematical program. \textsc{Grande} computes solutions with bounded error and scales up to problems of realistic sizes. Game-theoretic Resource Allocation for Malicious Packet Detection in Computer Networks Ondřej Vanĕkhas 2 papers, Zhengyu Yinhas 4 papers, Manish Jain, Branislav Bošanskýhas 3 papers, Milind Tambehas 13 papers, Michal Pĕchoučekhas 6 papers

4E_4

We analytically study the role played by the network topology in sustaining cooperation in a society of myopic agents in an evolutionary setting. In our model, each agent plays the Prisoner's Dilemma (PD) game with its neighbours, as specified by a network. Cooperation is the incumbent strategy, whereas defectors are the mutants. Starting with a population of cooperators, some agents are switched to defection. The agents then play the PD game with their neighbours and compute their fitness. After this, an evolutionary rule, or imitation dynamic is used to update the agent strategy. A defector switches back to cooperation if it has a cooperator neighbour with higher fitness. The network is said to sustain cooperation if almost all defectors switch to cooperation. Earlier work on the sustenance of cooperation has largely consisted of simulation studies, and we seek to complement this body of work by providing analytical insight for the same.
We find that in order to sustain cooperation, a network should satisfy some properties such as small average diameter, densification, and irregularity. Real-world networks have been empirically shown to exhibit these properties, and are thus candidates for the sustenance of cooperation. We also analyze some specific graphs to determine whether or not they sustain cooperation. In particular, we find that scale-free graphs belonging to a certain family sustain cooperation, whereas Erdos-Renyi random graphs do not. To the best of our knowledge, ours is the first analytical attempt to determine which networks sustain cooperation in a population of myopic agents in an evolutionary setting. Sustaining Cooperation on Networks: An Analytical Study based on Evolutionary Game Theory Raghunandan Ananthasayanam, Subramanian Chandrasekarapuram

4E_5

Studies in experimental economics have consistently demonstrated that Nash equilibrium is a poor description of human players' behavior in unrepeated normal-form games. Behavioral game theory offers alternative models that more accurately describe human behavior in these settings. These models typically depend upon the values of exogenous parameters, which are estimated based on experimental data. We describe methods for deriving and analyzing the posterior distributions over the parameters of such models, and apply these techniques to study two popular models (Poisson-CH and QLk), the latter of which we previously showed to be the best-performing existing model in a comparison of four widely-studied behavioral models. Drawing on a large set of publicly available experimental data, we derive concrete recommendations for the parameters that should be used with Poisson-CH, contradicting previous recommendations in the literature. We also uncover anomalies in QLk that lead us to develop a new, simpler, and better-performing family of models. Behavioral Game Theoretic Models: A Bayesian Framework For Parameter Analysis James Wright, Kevin Leyton-Brown

Session 4F – Logics for Agency

4F_1

We consider semantic structures and logics that differentiate between being uncertain about a proposition, being unaware of a proposition, becoming aware of a proposition and getting to know the truth value of a proposition. In this paper we give a unified setting to model all this variety of static and dynamic aspects of awareness and knowledge, without any constraints on the modal properties of knowledge (or belief -- such as introspection) or on the interaction between awareness and knowledge (such as awareness introspection). Our primitive epistemic operator is called \emph{speculative knowledge}. This is different from the better known \emph{implicit knowledge}, now definable, which plays a more restricted role. Some dynamic semantic primitives that are elegantly definable in our setting are the actions of 'becoming aware of a propositional variable', 'implicit knowledge', 'adressing a novel issue in an announcement', and also more complex ways in which an agent can become aware of a novel issue by way of increasing the complexity of the epistemic model. Action models for knowledge and awareness Hans van Ditmarschhas 2 papers, Tim French, Fernando R. Velázquez-Quesada

4F_2

Coalition logic is currently one of the most popular logics for multi-agent systems. While logics combining coalitional and epistemic operators have received considerable attention, completeness results for epistemic extensions of coalition logic have so far been missing. In this paper we provide several such results and proofs. We prove completeness for epistemic coalition logic with common knowledge, with distributed knowledge, and with both common and distributed knowledge, respectively. Furthermore, we completely characterise the complexity of the satisfiability problem for each of the three logics. Epistemic Coalition Logic: Completeness and Complexity Thomas Ågotnes, Natasha Alechinahas 3 papers

4F_3

We investigate parameter synthesis in the context of temporal-epistemic logic. We introduce CTLPK, a parametric extension to the branching time temporal-epistemic logic CTLK with free variables representing groups of agents. We give algorithms for automatically synthesising the groups of agents that make a given parametric formula satisfied. We discuss an implementation of the technique on top of the open-source model checker \textsc{MCMAS} and demonstrate its attractiveness by reporting the experimental results obtained. Group Synthesis for Parametric Temporal-Epistemic Logic Andrew Joneshas 2 papers, Michał Knapik, Alessio Lomusciohas 2 papers, Wojciech Penczekhas 2 papers

4F_4

The last decade has been witness to a rapid growth of interest in logics intended to support reasoning about the interactions between knowledge and action. Typically, logics combining dynamic and epistemic components contain ontic actions (which change the state of the world, e.g., switching a light on) or epistemic actions (which affect the information possessed by agents, e.g., making an announcement). We introduce a new logic for reasoning about the interaction between knowledge and action, in which each agent in a system is assumed to perceive some subset of the overall set of Boolean variables in the system; these variables give rise to epistemic indistinguishability relations, in that two states are considered indistinguishable to an agent if all the variables visible to that agent have the same value in both states. In the dynamic component of the logic, we introduce actions $r(p, i)$ and $c(p, i)$: the effect of $r(p, i)$ is to reveal variable $p$ to agent $i$; the effect of $c(p, i)$ is to conceal $p$ from $i$. By using these dynamic operators, we can represent and reason about how the knowledge of agents changes when parts of their environment are concealed from them, or by revealing parts of their environment to them. Our main technical result is a sound and complete axiomatisation for our logic. A Logic of Revelation and Concealment Wiebe van der Hoek, Petar Iliev, Michael Wooldridgehas 2 papers

4F_5

We consider models of multi-player games where abilities of players and coalitions are defined in terms of sets of outcomes which they can effectively enforce. We extend the well studied state effectivity models of one-step games in two different ways. On the one hand, we develop multiple state effectivity functions associated with different long-term temporal operators. On the other hand, we define and study coalitional path effectivity models where the outcomes of strategic play are infinite paths. For both extensions we obtain representation results with respect to concrete models arising from concurrent game structures. We also apply state and path coalitional effectivity models to provide alternative, arguably more natural and elegant semantics to alternating-time temporal logic ATL*, and discuss their technical and conceptual advantages. State and Path Coalition Effectivity Models for Logics of Multi-Player Games Valentin Goranko, Wojciech Jamrogahas 2 papers

Session 5A – Robotics III

5A_1

A key challenge to widespread deployment of mobile robots in the real-world is the ability to robustly and autonomously sense the environment and collaborate with teammates. Real-world domains are characterized by partial observability, non-deterministic action outcomes and unforeseen changes, making autonomous sensing and collaboration a formidable challenge. This paper poses vision-based sensing, information processing and collaboration as an instance of probabilistic planning using partially observable Markov decision processes. Reliable, efficient and autonomous operation is achieved using a hierarchical decomposition that includes: (a) convolutional policies to exploit the local symmetry of high-level visual search; (b) adaptive observation functions, policy re-weighting, automatic belief propagation and online updates of the domain map for autonomous adaptation to domain changes; and (c) a probabilistic strategy for a team of robots to robustly share beliefs. All algorithms are evaluated in simulation and on physical robots localizing target objects in dynamic indoor domains. Active Visual Sensing and Collaboration on Mobile Robots using Hierarchical POMDPs Shiqi Zhang, Mohan Sridharan

5A_2

Infrastructures for implementing agent architectures are currently unaware of what tasks the implemented agent is performing. Such knowledge would allow the infrastructure to improve the agent's autonomy and reliability. For example, the infrastructure could detect abnormal system states, predict likely faults and take preventive measures ahead of time, or balance system load based on predicted computational needs. In this paper we introduce a learning algorithm to automatically discover a state-transition model of the agent's behavior. The algorithm monitors the communication between architectural components, in the form of function calls, and finds the frequencies at which various functions are polled. It then determines the states according to what polling frequencies are active at any time. The two main novel features of the algorithm are that it is completely \emph{unsupervised} (it requires no human input) and \emph{task-agnostic} (it can be applied to any new task or architecture with minimal effort). What am I doing? Automatic Construction of an Agent's State-Transition Diagram through Introspection Constantin Berzan, Matthias Scheutz

5A_3

We present a supervised learning from demonstration system capable of training stateful and recurrent collective behaviors for multiple agents or robots. A model space of this kind is often high-dimensional and consequently may require a large number of samples to learn. Furthermore, the inverse problem posed by emergent macrophenomena among multiple agents presents major challenges to supervised learning methods. Our approach reduces the size of the state space, and shortens the gap between individual behaviors and macrophenomena, by manually decomposing individual behaviors and arranging the agents into a tree hierarchy. This makes it possible to train potentially large numbers of agents using a small number of samples. We demonstrate our system using hundreds of agents in a simulated foraging task, and on real robots performing a collective patrolling task. Learning from Demonstration with Swarm Hierarchies Keith Sullivan, Sean Luke

5A_4

Many robot dances are preprogrammed by choreographers for a particular piece of music so that the motions can be smoothly executed and synchronized with the dance music. We are interested in automating the task of robot dance choreography to allow robots to dance without detailed human planning. Robot dance movements are synchronized to the beats and reflect the emotion of any music. Our work is made up of two parts: (1) The first algorithm plans a sequence of dance movements that is driven by the beats and the emotions detected through the preprocessing of selected dance music. (2) We also contribute a real-time synchronizing algorithm to minimize the error between the execution of the motions and the plan. Our work builds on previous research to extract beats and emotions from music audio. We created a library of parameterized motion primitives, whereby each motion primitive is composed of a set of keyframes and durations and generate the sequence of dance movements from this library. We demonstrate the feasibility of our algorithms on the NAO humanoid robot to show that the robot is capable of using the mappings defined to autonomously dance to any music. Although we present our work using a humanoid robot, our algorithm is applicable to other robots. Autonomous Robot Dancing Driven by Beats and Emotions of Music Guangyu Xia, Junyun Tay, Roger Dannenberg, Manuela Velosohas 3 papers

Session 5B – Teamwork II

5B_1

The growing use of autonomous agents in practice may require agents to cooperate as a team in situations where they have limited prior knowledge about one another, cannot communicate directly, or do not share the same world models. These situations raise the need to design \emph{ad hoc} team members, i.e., agents that will be able to cooperate without coordination in order to reach an optimal team behavior. This paper considers the problem of leading $N$-agent teams by an agent toward their optimal joint utility, where the agents compute their next actions based only on their most recent observations of their teammates' actions. We show that compared to previous results in two-agent teams, in larger teams the agent might not be able to lead the team to the action with maximal joint utility, thus its optimal strategy is to lead the team to the best possible \emph{reachable} cycle of joint actions. We describe a graphical model of the problem and a polynomial time algorithm for solving it. We then consider other variations of the problem, including leading teams of agents where they base their actions on longer history of past observations, leading a team by more than one ad hoc agent, and leading a teammate while the ad hoc agent is uncertain of its behavior. Leading Ad Hoc Agents in Joint Action Settings with Multiple Teammates Noa Agmonhas 2 papers, Peter Stonehas 5 papers

5B_2

This paper is concerned with evaluating different multiagent learning (MAL) algorithms in problems where individual agents may be heterogenous, in the sense of utilising different learning strategies, without the opportunity for prior agreements or information regarding coordination. Such a situation arises in \emph{ad hoc team} problems, a model of many practical multiagent systems applications. Prior work in multiagent learning has often been focussed on homogeneous groups of agents, meaning that all agents were identical and a priori aware of this fact. Also, those algorithms that are specifically designed for ad hoc team problems are typically evaluated in teams of agents with fixed behaviours, as opposed to agents which are adapting their behaviours. In this work, we empirically evaluate five MAL algorithms, representing major approaches to multiagent learning but originally developed with the homogeneous setting in mind, to understand their behaviour in a set of ad hoc team problems. All teams consist of agents which are continuously adapting their behaviours. The algorithms are evaluated with respect to a comprehensive characterisation of repeated matrix games, using performance criteria that include considerations such as attainment of equilibrium, social welfare and fairness. Our main conclusion is that there is no clear winner. However, the comparative evaluation also highlights the relative strengths of different algorithms with respect to the type of performance criteria, e.g., social welfare vs. attainment of equilibrium. Comparative Evaluation of MAL Algorithms in a Diverse Set of Ad Hoc Team Problems Stefano Albrecht, Subramanian Ramamoorthyhas 2 papers

5B_3

In multiagent team settings, the agents are often given a protocol for coordinating their actions. When such a protocol is not available, agents must engage in ad hoc teamwork to effectively cooperate with one another. A fully general ad hoc team agent needs to be capable of collaborating with a wide range of potential teammates on a varying set of joint tasks. This paper presents a framework for analyzing ad hoc team problems that sheds light on the current state of research and suggest avenues for future research. In addition, this paper shows how previous theoretical results can aid ad hoc agents in a set of testbed domains. An Analysis Framework for Ad Hoc Teamwork Tasks Samuel Barretthas 2 papers, Peter Stonehas 5 papers

5B_4

The performance of a team at a task depends critically on the composition of its members. There is a notion of synergy in human teams that represents how well teams work together, and we are interested in modeling synergy in multi-agent teams. We focus on the problem of team formation, i.e., selecting a subset of a group of agents in order to perform a task, where each agent has its own capabilities, and the performance of a team of agents depends on the individual agent capabilities as well as the synergistic effects among the agents. We formally define synergy and how it can be computed using a synergy graph, where the distance between two agents in the graph correlates with how well they work together. We contribute a learning algorithm that learns a synergy graph from observations of the performance of subsets of the agents, and show that our learning algorithm is capable of learning good synergy graphs without prior knowledge of the interactions of the agents or their capabilities. We also contribute an algorithm to solve the team formation problem using the learned synergy graph, and experimentally show that the team formed by our algorithm outperforms a competing algorithm. Modeling and Learning Synergy for Team Formation with Heterogeneous Agents Somchaya Liemhetcharat, Manuela Velosohas 3 papers

Session 5C – Emergence

5C_1

We show how the quality of the decisions based on the aggregated opinions of the crowd can be conveniently studied using a sample of individual responses to a standard IQ questionnaire. We aggregated the responses to the IQ questionnaire using simple majority voting and a machine learning approach based on a probabilistic graphical model. The score for the aggregated questionnaire, Crowd IQ, serves as a quality measure of decisions based on aggregating opinions, which also allows quantifying individual and crowd performance on the same scale.
We show that Crowd IQ grows quickly with the size of the crowd but saturates, and that for small homogeneous crowds the Crown IQ significantly exceeds the IQ of even their most intelligent member. We investigate alternative ways of aggregating the responses and the impact of the aggregation method on the resulting Crowd IQ. We also discuss Contextual IQ, a method of quantifying the individual participant's contribution to the Crowd IQ based on the Shapley value from cooperative game theory. Crowd IQ - Aggregating Opinions to Boost Performance Yoram Bachrach, Thore Graepel, Gjergji Kasneci, Michal Kosinski, Jurgen Van-Gael

5C_2

In this paper we present an approach for improving the accuracy of shared opinions in a large decentralised team. Specifically, our solution optimises the opinion sharing process in order to help the majority of agents to form the correct opinion about a state of a common subject of interest, given only few agents with noisy sensors in the large team. We build on existing research that has examined models of this opinion sharing problem and shown the existence of optimal parameters where incorrect opinions are filtered out during the sharing process. In order to exploit this collective behaviour in complex networks, we present a new decentralised algorithm that allows each agent to gradually regulate the importance of its neighbours' opinions (their social influence). This leads the system to the optimised state in which agents are most likely to filter incorrect opinions, and form a correct opinion regarding the subject of interest. Crucially, our algorithm is the first that does not introduce additional communication over the opinion sharing itself. Using it 80-90% of the agents form the correct opinion, in contrast to 60-75% with the existing message-passing algorithm DACOR proposed for this setting. Moreover, our solution is adaptive to the network topology and scales to thousands of agents. Finally, the use of our algorithm allows agents to significantly improve their accuracy even when deployed by only half of the team. Efficient Opinion Sharing in Large Decentralised Teams Oleksandr Pryymak, Alex Rogershas 9 papers, Nick Jenningshas 11 papers

5C_3

In recent years, social networking sites and social media have become a very important part of peoples' lives, driving everything from family relationships to revolutions. In this work, we study the different patterns of interaction behavior seen in an online social network. We investigate the difference in the relative time people allocate to their friends versus that which their friends allocate to them, and propose a measure for this difference in time allocation. The distribution of this measure is used to identify classes of social agents through agglomerative hierarchical clustering. These classes are then characterized in terms of two important structural attributes: Degree distributions and clustering coefficients.
We demonstrate our approach on two large social networks obtained from Facebook. For each network we have the list of all social interactions that took place over six months. The total number of people in the two networks is 939,453, and 841,456 with 1.4 million and 8.4 million interactions, respectively. Our results show that, based on the interaction behavior, there are four main classes of agents in both networks, and that they are consistent across the two networks. Furthermore, each class is characterized by a specific profile of degree distributions and clustering coefficients, which are also consistent across both networks.
We speculate that agents corresponding to the four classes play different roles in the social network. To test this, we developed an opinion propagation model where opinions are represented as m-bit strings communicated from agent to agents. An agent receiving an opinion then selectively modifies its own opinions depending on the social and informational value it places upon communications from the sending agent, its overall agreement with the sending agent, and its own propensities. Opinions are injected into the system by agents of specific classes and their spread is tracked by propagating tags. The resulting data is used to analyze the influence of agents from each class in the viral spread of ideas under various conditions. The analysis also shows what behavioral factors at the agent level have the most significant impact on the spreading of ideas. Agents of Influence in Social Networks Amer Ghanem, Srinivasa Vedanarayanan, Ali Minai

5C_4

Agents make commitments towards others in order to influence others in a certain way, often by dismissing more profitable options. Most commitments depend on some incentive that is necessary to ensure that the action is in the agent's interest and thus, may be carried out to avoid eventual penalties. The capacity for using commitment strategies effectively is so important that natural selection may have shaped specialized capacities to make this possible. Evolutionary explanations for commitment, particularly its role in the evolution of cooperation, have been actively sought for and discussed in several fields, including Psychology and Philosophy. In this paper, using the tools of evolutionary game theory, we provide a new model showing that individuals tend to engage in commitments, which leads to the emergence of cooperation even without assuming repeated interactions. The model is characterized by two key parameters: the punishment cost of failing commitment imposed on either side of a commitment, and the cost of managing the commitment deal. Our analytical results and extensive computer simulations show that cooperation can emerge if the punishment cost is large enough compared to the management cost. The Emergence of Commitments and Cooperation The Anh Han , Luís Moniz Pereira , Francisco C. Santos

Session 5D – Auction & Mechanism Design

5D_1

Revenue maximization in multi-item settings is notoriously elusive. This paper studies a class of two-item auctions which we call a \emph{mixed-bundling auction with reserve prices (MBARP)}. It calls VCG on an enlarged set of agents by adding the seller -- who has reserve valuations for each bundle of items -- and a fake agent who receives nothing nor has valuations for any item or bundle, but has a valuation for pure bundling allocations, i.e., allocations where the two items are allocated to a single agent. This is a strict subclass of several known classes of auctions, including the affine maximizer auction (AMA), $\lambda$-aution, and the virtual valuations combinatorial auction (VVCA). As we show, a striking feature of MBARP is that its revenue can be represented in a simple closed form as a function of the parameters. Thus, we can solve first-order conditions on the parameters and obtain the optimal MBARP. The optimal MBARP yields significantly higher revenue than prior auctions for which the revenue-maximizing parameters could be solved for in closed form: separate Myerson auctions, pure-bundling Myerson auction, VCG, and mixed-bundling auction without reserve prices. Its revenue even exceeds that obtained via simulation within broader classes: VVCA and AMA. Mixed-bundling auctions with reserve prices Pingzhong Tang, Tuomas Sandholmhas 4 papers

5D_2

Scoring rules for eliciting expert predictions of random variables are usually developed assuming that experts derive utility only from the quality of their predictions. We study more realistic settings in which (a) the principal is a decision maker who takes a decision based on the expert's prediction; and (b) the expert has an inherent \emph{interest} in the decision. Not surprisingly, in such situations, the expert usually has an incentive to misreport her forecast to influence the choice of the decision maker. We develop a general model for this setting and introduce the concept of a \emph{compensation rule}. When combined with the expert's inherent utility for decisions, a compensation rule induces a \emph{net scoring rule} that behaves like a traditional scoring rule. Assuming full knowledge of expert utility, we provide a complete characterization of all (strictly) proper compensation rules. We then analyze the case when the expert's utility function is not fully known to the decision maker. We show bounds on: (a) expert incentive to misreport; (b) the degree to which an expert will misreport; and (c) decision maker loss in utility due to such uncertainty. These bounds depend in natural ways on the degree of uncertainty, the local degree of convexity of net scoring function, and properties of the decision maker's utility function. Finally, we briefly discuss the use of compensation rules in prediction markets. Eliciting Forecasts from Self-interested Experts: Scoring Rules for Decision Makers Craig Boutilier

5D_3

Many important problems in multiagent systems involve the allocation of multiple resources among the agents. For resource allocation problems, the well-known VCG mechanism satisfies a list of desired properties, including efficiency, strategy-proofness, individual rationality, and the non-deficit property. However, VCG is generally not budget-balanced. Under VCG, agents pay the VCG payments, which reduces social welfare. To offset the loss of social welfare due to the VCG payments, VCG redistribution mechanisms were introduced. These mechanisms aim to redistribute as much VCG payments back to the agents as possible, while maintaining the aforementioned desired properties of the VCG mechanism.
We continue the search for worst-case optimal VCG redistribution mechanisms -- mechanisms that maximize the fraction of total VCG payment redistributed in the worst case. Previously, a worst-case optimal VCG redistribution mechanism (denoted by WCO) was characterized for multi-unit auctions with nonincreasing marginal values. Later, WCO was generalized to settings involving heterogeneous items, resulting in the HETERO mechanism. This \emph{conjectured} that HETERO is feasible and worst-case optimal for heterogeneous-item auctions with unit demand. In this paper, we propose a more natural way to generalize the WCO mechanism. We prove that our generalized mechanism, though represented differently, actually coincides with HETERO. Based on this new representation of HETERO, we prove that HETERO is indeed feasible and worst-case optimal in heterogeneous-item auctions with unit demand. Finally, we conjecture that HETERO remains feasible and worst-case optimal in the even more general setting of combinatorial auctions with gross substitutes. Worst-Case Optimal Redistribution of VCG Payments in Heterogeneous-Item Auctions with Unit Demand Mingyu Guo

5D_4

In real electronic markets, each bidder arrives and departs over time. Thus, such a mechanism that must make decisions dynamically without knowledge of the future is called an \emph{online mechanism}. In an online mechanism, it is very unlikely that the mechanism designer knows the number of bidders beforehand or can verify the identity of all of them. Thus, a bidder can easily submit multiple bids (\emph{false-name bids}) using different identifiers (e.g., different e-mail addresses). In this paper, we formalize false-name manipulations in online mechanisms and identify a simple property called (value, time, identifier)-monotonicity that characterizes the allocation rules of false-name-proof online auction mechanisms. To the best of our knowledge, this is the first work on false-name-proof online mechanisms. Furthermore, we develop a new false-name-proof online auction mechanism for $k$ identical items. When $k$=1, this mechanism corresponds to the optimal stopping rule of the secretary problem where the number of candidates is unknown. We show that the competitive ratio of this mechanism for efficiency is 4 and independent from $k$ by assuming that only the distribution of bidders' arrival times is known and that the bidders are impatient. False-name-proofness in Online Mechanisms Taiki Todo, Takayuki Mouri, Atsushi Iwasakihas 4 papers, Makoto Yokoohas 4 papers

Session 5E – Game & Agent Theories

5E_1

Multiagent simulation extends the reach of game-theoretic analysis to scenarios where payoff functions can be computed from implemented agent strategies. However this approach is limited by the exponential growth in game size relative to the number of agents. Player reductions allow us to construct games with a small number of players that approximate very large symmetric games. We introduce \emph{deviation-preserving reduction}, which generalizes and improves on existing methods by combining sensitivity to unilateral deviation with granular subsampling of the profile space. We evaluate our method on several classes of random games and show that deviation-preserving reduction performs better than prior methods at approximating full game equilibria. Scaling Simulation-Based Game Analysis through Deviation-Preserving Reduction Bryce Wiedenbeck, Michael Wellmanhas 2 papers

5E_2

Boolean games are a compact and expressive class of games, based on propositional logic. However, Boolean games are computationally complex: checking for the existence of pure Nash equilibria in Boolean games is $\Sigma^p_2$-complete, and it is co-NP-complete to check whether given a given outcome for a Boolean game is a pure Nash equilibrium. In this paper, we consider two possible avenues to tractability in Boolean games. First, we consider the development of alternative solution concepts for Boolean games. We introduce the notion of $k$-bounded Nash equilibrium, meaning that no agent can benefit from deviation by altering fewer than $k$ variables. After motivating and discussing this notion of equilibrium, we give a logical characterisation of a class of Boolean games for which $k$-bounded equilibria correspond to Nash equilibria. That is, we precisely characterise a class of Boolean games for which all Nash equilibria are in fact $k$-bounded Nash equilibria. Second, we consider classes of Boolean games for which computational problems related to Nash equilibria are easier than in the general setting. We first identify some restrictions on games that make checking for beneficial deviations by individual players computationally tractable, and then show that certain types of \emph{socially desirable} equilibria can be hard to compute even when the standard decision problems for Boolean games are easy. We conclude with a discussion of related work and possible future work. Towards Tractable Boolean Games Paul Dunne, Michael Wooldridgehas 2 papers

5E_3

This article presents a population-based cognitive hierarchy model that can be used to estimate the reasoning depth and sophistication of a collection of opponents' strategies from observed behavior in repeated games. This framework provides a compact representation of a distribution of complicated strategies by reducing them to a small number of parameters. This estimated population model can be then used to compute a best response to the observed distribution over these parameters. As such, it provides a basis for building improved strategies given a history of observations of the community of agents. Results show that this model predicts and explains the winning strategies in the recent 2011 Lemonade Stand Game competition, where eight algorithms had been pitted against each other. The Lemonade Stand Game is a three-player game with simple rules that includes both cooperative and competitive elements. Despite its apparent simplicity, the fact that success depends crucially on what other players do gives rise to complex interaction patterns, which our new framework captures well. A Framework for Modeling Population Strategies by Depth of Reasoning Michael Wunder, Michael Kaisershas 2 papers, John Robert Yaros, Michael Littman

5E_4

In many multiagent domains, no single observation event is sufficient to determine that the behavior of individuals is suspicious. Instead, suspiciousness must be inferred from a combination of multiple events, where events refer to the individual's interactions with other individuals. Hence, a detection system must employ a detector that combines evidence from multiple events, in contrast to most previous work, which focuses on the detection of a single, clearly suspicious event. This paper proposes a two-step detection system, where it first detects trigger events from multiagent interactions, and then combines the evidence to provide a degree of suspicion. The paper provides three key contributions: (i) proposes a novel detector that generalizes a utility-based plan recognition with arbitrary utility functions, (ii) specifies conditions that any reasonable detector should satisfy, and (iii) analyzes three detectors and compares them with the proposed approach. The results on a simulated airport domain and a dangerous-driver domain show that our new algorithm outperforms other approaches in several settings. Detection of Suspicious Behavior from a Sparse Set of Multiagent Interactions Boštjan Kalužahas 2 papers, Gal Kaminkahas 3 papers, Milind Tambehas 13 papers

Session 5F – Logic and Verification

5F_1

Emotions is a cognitive mechanism that directs an agent's thoughts and attentions to what is relevant, important, and significant. Such a mechanism is crucial for the design of resource-bounded agents that must operate in highly-dynamic, semi-predictable environments and which need mechanisms for allocating their computational resources efficiently. The aim of this work is to propose a logical analysis of emotions and their influences on an agent's behaviour. We focus on four emotion types (viz, hope, fear, joy, and distress) and provide their logical characterizations in a model logic framework. As the intensity of emotion is essential for its influence on an agent's behaviour, the logic is devised to represent and reason about graded beliefs, graded goals and intentions. The belief strength and the goal strength determine the intensity of emotions. Emotions trigger different types of coping strategy which are aimed at dealing with emotions either by forming or revising an intention to act in the world, or by changing the agent's interpretation of the situation (by changing beliefs or goals). A logic of emotions: from appraisal to coping Mehdi Dastanihas 4 papers, Emiliano Lorini

5F_2

We present a methodology for the automatic verification of multi-agent systems against temporal-epistemic specifications derived from higher-level languages defined over convergent equational theories. We introduce a modality called \emph{rewriting knowledge} that operates on local equalities. We discuss the conditions under which its interpretation can be approximated by a second modality that we introduce called \emph{empirical knowledge}. Empirical knowledge is computationally attractive from a verification perspective. We report on an implementation of a technique to verify this modality with the open source model checker \textsc{MCMAS}. We evaluate the approach by verifying multi-agent models of electronic voting protocols automatically extracted from high-level descriptions. Automatic Verification of Epistemic Specifications under Convergent Equational Theories Ioana Boureanu, Andrew Joneshas 2 papers, Alessio Lomusciohas 2 papers

5F_3

Information-Based Interaction-Oriented Programming, specifically as epitomized by the Blindingly Simple Protocol Language (BSPL), is a promising new approach for declaratively expressing multi-agent protocols. BSPL eschews traditional control flow operators and instead emphisizes causality and integrity based solely on the information models of the messages exchanged. BSPL has been shown to support a rich variety of practical protocols and can be realized in a distributed asynchronous architecture wherein the agents participating in a protocol act based on local knowledge alone. The flexibility and generality of BSPL mean that it needs a strong formal semantics to ensure correctness as well as automated tools to help develop protocol specifications.
We provide a formal semantics for BSPL and formulate important technical properties, namely, enactability, safety, and liveness. We further describe our declarative implementation of the BSPL semantics as well as of verifiers for the above properties using a temporal reasoner. We have validated our implementation by verifying the correctness of several protocols of practical interest. Semantics and Verification of Information-Based Protocols Munindar Singhhas 3 papers