Ensuring quality in modern video games is a growing challenge faced by game companies. Classically, this challenge was met by game testers constantly playing the game or creating scripted behaviors to validate that a game looks and works as expected. However, as games grow in both size and complexity, it becomes infeasible to depend upon human testers and scripted behaviors alone to confirm quality. Using Reinforcement Learning (RL) to complement current approaches in game testing is a promising technique as RL agents can learn directly from playing the game without the need of human intervention. However, despite the exciting achievements applying reinforcement learning to AAA games, they have required an incredible amount of time and data with an equally impressive amount of engineering. Considering that AAA game development on its own is a colossal undertaking, minimizing the engineering cost of training and using RL agents for automated testing is vital to the success of the game project. To further complicate the issue, the game being tested is under active development. This means that game functionality can be added, changed, or removed independently of game visuals which themselves are subject to constant rapid and dramatic iterations. Thus, to use RL methods to address the growing problem of ensuring quality AAA games, models must be cheap to train and robust to change.
Much of RL research focuses on domains in which there is an assumed observation space. For example, early research in reinforcement learning using deep learning (deep RL) involved learning to play Atari 2600 games using the pixel rendering of the game. Research on these games led to many breakthroughs in terms of how to train RL agents with high dimensional input such as images.
This is a video of a random agent playing the video game known as “Breakout” which was originally implemented on the Atari 2600 game console.
Of the many lessons learned by this research direction, it became evident that the representation of observations can significantly affect the agent’s behavior. In practical terms, this means that the choice of what an agent can observe inherently affects the kind of behavior that an agent can demonstrate. For self-driving cars, this wisdom manifests in the use of expensive LIDAR and camera sensors to enable the on-board machine learning models to become more aware of their surroundings. For video games, however, as all sensors to RL agents are virtual, the only restriction to the observations is their computational expense. This is evident in the popular Unity repository, ML-Agents in which virtual sensors collect observations for training and executing models.
Up until late September of 2020, there were only two kinds of sensors available in ML-Agents: cameras and raycasts. Like the Atari environment, the camera sensor provides a grayscale or RGB image as input to the agent whereas the raycast allows agents to observe and collect data about a game object down a line of sight. With direct access to the game object, the developer has control over what attributes of the game object, such as health, can be returned and observed by the RL agent. Although both types of sensors have been used in a variety of environments, they have limitations when training agents. A detailed description of these limitations is included in a Unity Blog article describing how Eidos–Labs, along with collaborators from Matsuko, developed the Grid Sensor to specifically address these concerns.
The Grid Sensor combines the generality of data extraction from raycasts with the computational efficiency of CNNs (Convolutional Neural Networks). The Grid Sensor collects data from GameObjects by querying the physics properties and then structures the data into a “height x width x channel” matrix. This matrix is analogous to an image from an orthographic camera but rather than representing red, green, and blue color values of objects, the “pixels” (or cells) of a Grid Sensor represent a vector of arbitrary data of objects relative to the sensor. Another benefit is that the grid can have a lower resolution, which can improve training times. This matrix can then be fed into a CNN (Convolutional Neural Network) and used for either data analysis or to train RL agents.
For information about the Grid Sensor implementation in Unity, see Eidos–Labs’s Pull Request on the ML-Agents repository.
Beyond the advantages of faster training times and smaller models, there are less obvious advantages for using the Grid Sensor for RL agents. The Grid Sensor uses tags, such as Enemy or Collectable, to filter what aspects of a game should be observed. When abstractions like Enemies and Collectables can be identified across different games, it allows for minimal engineering to port the agent to the new game and focus on hyperparameter tuning. Further, as the Grid Sensor can be attached to a Character controlled by a human, it provides a straightforward way to collect data from play sessions. This data can then be used for various purposes beyond testing such as understanding game dynamics, predicting player behavior, and learning policies via imitation all while remaining independent of game visuals.
Recently the Grid Sensor was ported to Unreal, validating the prior success of the Grid Sensor in Unity. During this rewriting of the Grid Sensor, it became evident that being able to query an object by a type rather than the tag of an object allows one to take advantage of the class hierarchies of objects while avoiding costly string comparisons.
Below is an example of an agent that was trained within Unreal to move towards a target within a simple Gym. The agent is given a Grid Sensor observation as well as a vector containing the direction of the goal, the direction the agent is facing, and the current distance to the goal relative to the starting distance. The actions are keyboard inputs fed directly to the Unreal PlayerController. This minimal example is a first step of many towards an automated game testing framework that is agnostic to game visuals and game specific actions.
A simple gym environment implemented in the Unreal game engine which requires an agent to navigate obstacles to move towards a desired goal. The agent moves towards the goal by selecting keyboard actions which are sent to the PlayerController.
Although the Grid Sensor addresses some of the issues faced when using Reinforcement Learning for automated game testing, it makes some assumptions about the kinds of tasks in which it can be used. Nevertheless, like every tool, it is not a solution to every problem, but it does shed light on both the freedom and the challenges that machine learning has in video game applications. At Eidos–Labs, we look forward to sharing our research journey as we aim to improve the experience of developers, designers, and the players.
Author
Jaden Travnik joined Eidos-Montréal in 2018 as a Machine Learning Specialist. He obtained his MSc in Computing Science in 2018 under the supervision of Dr. Patrick Pilarski where his research activities were focused on Reinforcement Learning on prosthetic devices. Jaden sees how thoughtful applications of Machine Learning can remove obstacles for people while giving them more control over the things that matter to them. Video game development has many such obstacles, and it is exciting to see how Machine Learning can address these challenges.