Playing Games Without Any Controllers
TL; DR ?
It’s a fast-paced world that we are living in today, and I understand that. You might ask: “What does this blog post say in a few words?” Here is the answer: I elaborate on the structure and development of the project that I have developed using Python and Unity3D game engine. This project can be thought of as the intersection of image processing and game development.
Why not take a look at the screen capture of the final results of the project, Huh?
I know the real programmers would say “Show me the code!”, but , we will get into that as well :D Follow along for explanation on the general picture of the project.
The Desire for more Interactive Game Experience
The idea of making video games more entertaining through player’s movements and interactions has been around since the early days of video games. The 1984 video game Duck Hunt might be a well example of this effort. Joysticks and game controllers that resemble steering wheel of cars are also well known examples of the desire to make the player feel inside the imaginary world of video games.
However, this old and long-standing goal of game makers, has been moved to a whole new level through the recent innovations in Virtual Reality and Augmented Reality devices and technologies, collectively referred to as XR. Let us not overlook the key role of higher processing capabilities of devices, and the advances in the Artificial Intelligence algorithms. The fast-paced advances of machine learning methods both in software and hardware terms fuel these technologies and innovations and provide an ever increasing realistic and entertaining experiences for the gamers.
As a computer science student whose research focus revolves around the ability of machines to master what humans can do a.k.a “Machine Learning”, and a former game developer and a current game development enthusiast, I took the advantage of my free time to personally explore some practical examples where the image processing aspects of machine learning intersects game development. I designed and developed a project to challenge my knowledge and experiences in the areas of my interest. I will talk about the project in a while.
Before getting into the details of this project , Let’s take a look at the final result of this project in the action:
The Big Picture
This project consists of two major components. The pose estimation component that is developed and executed in Python environment. And a 2D platformer game, that is based around the Unity game engine.
The general architecture and dataflow of this project can be depicted as follows:
Now, lets take a look at some details about each of the components:
Unity Engine Side:
There is a 2D platformer game developed in unity3D. The character is a cat that holds a gun and walks around and shoots evil creatures. The character is controlled by the information of the position of the player’s head, and hands in the captured video. If the player moves to the right side of the scene captured by the webcam, then the character turns to left side of the screen and keeps moving towards left.
If the player stands right in the middle of the screen, the character stands still. Well, codingly speaking (Yeah, there is no such word in dictionary, but I think it makes sense :D) there is a threshold for that, the central 10 percent of the image is the default threshold. If the player moves to the left side of the scene, then the character turns to right and keeps moving towards right.
The cat in the game aims and shoots based on the movements of the hands of the character. The relative position of the hands forms a vector that is used as a reference for the guns direction.
If the hands of the player are closer than a certain threshold (that can be set in menus), then the cat shoots. And as long as the hands are close together , the gun keeps shooting. The can be changed to single shot inputs easily. However, I did not implement that and I chose to set a very low number for the firing rate.
The game project in unity uses a script called PythonPlugger.cs to run the Python application (explained after in the next section) as an external process. Unity keeps reading the body pose information from the standard output buffer of the Python process. I initially used a UDP connection for this purpose, however, I found the direct read and write from the external process more handy and reliable.
Body Pose Extractor ( Python side) :
The body pose of the player is essentially determined by a predefined model for body point extraction. Facebook’s research center, has developed and published a full fledged tool with several pretrained models for various image processing tasks known as Detectron2.
I used OpenCV in order to capture the information of the webcam in Python, and then I feed the frames of the webcam’s video stream to Detectron 2, in order to extract the key points of the player’s body pose. Once the key points of each frame are extracted, they are written into the python’s standard output buffer (stdout) to be read in C#.
Despite the fact that Unity3D has recently added built-in capabilities for machine learning algorithms, I preferred to call a Python application as an external process from Unity3D for reasons that I will explain later in this post.
The code in unity engine spawns the pose extractor application as an external process. The Python code, writes the pose information to its stdout buffer, and the unity engine directly reads and uses that data for controlling the character. Feel free to check out the source codes available on my GitHub account here and here to learn more about this project.
Whys and Why nots
There a few whys and why nots about this project, that I explain in this section as pairs of questions and answers.
Q: Why haven’t I used the built-in libraries and tools provided by Unity engine in order to perform the pose estimation task?
A: In order to make a machine learning model runnable in Unity, the model should be first exported to ONNX format. Then, it can be imported in Unity3D as an asset. I tried to do this for a while. However, I realized that a few tensor operations are not supported by Unity3D’s Barracuda. Besides, I was not able to figure out the errors that I had been facing in a tolerable amount of time (We, as humans have a limited tolerance after all. Although we are programmers :D). Furthermore, initially, I wanted to develop a set of interconnected machine learning methods for performing other tasks as well. Converting to ONNX format would tighten my development flexibility since I would be working with a fixed model in Unity. Moreover, the main focus of doing this project for me lies mostly upon on the machine learning aspects of the project rather than a single and complete production ready application. So, this was the main reason that I wanted to keep working on that stuff in the Python environment and avoid the unnecessary loads of work, porting all that stuff from Python to Unity’s Barracuda.
Q: Why did I create the game assets myself? :D
A: Because it makes the project much more entertaining for me to work on. Especially, with the main character :D
There is yet a lot more to be explored for making games more entertaining for the players. In the meantime, I am developing other projects in my free time. Please feel free to reach out to me regarding this project and this blog. I fully appreciate any comments and feedbacks.
As one last note, my fellow researchers and students are also working on a Twitch channel called PatriotXR with a focus on XR (AR/VR). So feel free to follow that and provide feed back on that as well.
Resources and References:
 The Unity project of this demo is available here: https://github.com/k-timy/unity-pose-2d-control
 The pose estimation scripts are available here: https://github.com/k-timy/pose-estimation-for-unity
 Facebook’s Detectron2 main page: https://github.com/facebookresearch/detectron2/tree/master/detectron2
 The website of the Unity Barracuda: https://docs.unity3d.com/Packagesfirstname.lastname@example.org/manual/index.html
 Our Twitch channel about XR: https://www.twitch.tv/patriotxr