Voice Hearthstone
Download Binary
Download Source
Voice Hearthstone Project, also known as "Manuela Plays Hearthstone" in honor of a previous voice assistant project, is a personal challenge I decided to accomplish for self professional development and as an excuse for working with different technologies.
The program allows the user to play Blizzard's game Hearthstone using customizable voice commands. It runs on background and detects speech on mic to move the mouse and click on the game's screen. In order to ease the commands, a numeric overlay shows over cards while the program is running.
PROGRAMMING STEPS
The idea of the program was to keep it simple for the user, so for example if the player wanted to play his second card in hand (a spell)
on a specific minion, he would just say "my hand 2, enemy minion 4". In the end, the program would just translate specific voice commands into mouse movements and clicks. However, the position of cards on the screen varies depending on how many cards there are, etc. So the program should track the state of the game in order to achieve correct positions.
Log Tracker:
My first step was to know which cards and minions were on screen at any given time, for that, I had to track a log that Hearthstone generates with plenty of information and filter the useful lines. In my case, all logs refering to "Zone" had info of when a card was going into a hand, played to a side of the table, moved to the graveyard etc. All this info is filtered by the Log Tracker and then passed to the next system to be processed.
Game Tracker:
With the raw information of the Log Tracker, the Game Tracker is in charge of arranging all this data and represent the state of the game: the cards a player has in his hand, the minions on both sides of the table and even the cards that are being discovered or mulliganed.
Voice Manager:
This system activates Windows Speech Recognition Engine and loads a customizable grammar. Once the grammar is built, the Voice Manager keeps listening to the player sentences and, in the case of a positive match, it queues the command to the next system.
The grammar is prepared to work with different cultures, each of them needs a text file, for example "en-US.txt" with a content similar to this:
MY_HAND = my hand
MY_TABLE = my table
MY_HERO = my hero, me
ENEMY_TABLE = enemy table, his table, her table
ENEMY_HERO = enemy hero, his hero, her hero, face
HERO_POWER = hero power
DISCOVER = discover, choose
DISCOVER_SHOW_HIDE = show, hide
READY = ready
CANCEL = cancel
FULL_ENEMY_FACE = all face, everybody face, full face
MULLIGAN = mulligan, change
MULLIGAN_ACCEPT = confirm mulligan
In a different language, like spanish, the file "es-ES.txt" would contain "MY_HAND = mi mano" etc.
Each line of the file is of the format "keyword = sentence1, sentence2, ..." so the Voice Manager recognices any sentence and maps it to a keyword. As seen, it can be configured so different sentences mean the same keyword. For example, saying "enemy hero" or "face" will both be the keyword ENEMY_HERO and the program will click on enemy hero's portrait.
Zone To Screen Manager:
This system is in charge of mapping the available voice commands into screen coordinates to move the mouse and click on cards. On the one hand, it receives the current commands to process from the Voice Manager, on the other hand, it calculates the current relative screen coordinates based on information provided by the Game Tracker. Once the final pixel coords have been calculated for a specific command, the mouse is moved there and then a click event is raised to the Hearthstone window handler.
Overlay Window:
Using WPF I created a very simple overlay window that enumerates cards and minions so the player doesn't need to count cards. If player hand is full sometimes it's hard to know if a card is the fourth, the fifth...
PROS, CONS AND THOUGHTS
The main positive thing about Voice Harthstone, apart from allowing anyone to play hands-free, is the knowledge growth it has provided me with: analizing and understanding Hearthstone logs, implementing multithreaded processes interacting with 3rd party programs, graphic overlays and multi language speech recognition is not something I do every day.
There are many cons yet, of course: multilanguage is not well tested, maybe very different resolutions don't match screen coordinates properly, speech recognition relies on Windows (which means it sometimes does not work if not trained), Hearthstone needs to be configured to produce the logs, some card effects are not yet tracked (like Jaraxxus replacing your hero and not staying on table) etc, etc.
Why using numbers for the cards instead of the names? I thought about this, but each language has different names, so all that should be mapped either by hand or by a database. Even though accesing a hearthstone database would be nice, numbers should still be used if there are more than one card of the same kind in game, so I decided ging the straight way.
Why not track the whole thing like Hearthstone Deck Tracker or similar apps? The purpose of this program was solely to play using voice commands, other functionalities could be nice to add in the future if I feel like it.
Why not extending a program like Hearthstone Deck Tracker with this feature, as it provides tools for that? This is something I started exploring, but in the end decided to go my own way from scratch. The reason is very simple, and I will use the same words a friend of mine once told me: "If you don't do the whole thing, where's the fun?"
The program allows the user to play Blizzard's game Hearthstone using customizable voice commands. It runs on background and detects speech on mic to move the mouse and click on the game's screen. In order to ease the commands, a numeric overlay shows over cards while the program is running.
PROGRAMMING STEPS
The idea of the program was to keep it simple for the user, so for example if the player wanted to play his second card in hand (a spell)
on a specific minion, he would just say "my hand 2, enemy minion 4". In the end, the program would just translate specific voice commands into mouse movements and clicks. However, the position of cards on the screen varies depending on how many cards there are, etc. So the program should track the state of the game in order to achieve correct positions.
Log Tracker:
My first step was to know which cards and minions were on screen at any given time, for that, I had to track a log that Hearthstone generates with plenty of information and filter the useful lines. In my case, all logs refering to "Zone" had info of when a card was going into a hand, played to a side of the table, moved to the graveyard etc. All this info is filtered by the Log Tracker and then passed to the next system to be processed.
- Types of filters used (with regular expresions):
- Detect where the game starts: "GameAccountId"
- A card has moved from one zone to another: "^.*\[name=(.*) id=(.*) zone=.*\] (.*) from (.*) -> (.*)$"
- A card is being discovered: "^.*entity=\[name=(.*) id=(.*) zone=SETASIDE.*$"
- Discovery phase has finished: "^.*tag=NUM_OPTIONS_PLAYED_THIS_TURN.*$"
Game Tracker:
With the raw information of the Log Tracker, the Game Tracker is in charge of arranging all this data and represent the state of the game: the cards a player has in his hand, the minions on both sides of the table and even the cards that are being discovered or mulliganed.
- Types of zones (every card has one of each):
- By player: Friendly Zone, Opposing Zone, Unknown Zone
- By place: Deck, Hand, Play, Hero Power, Graveyard...
Voice Manager:
This system activates Windows Speech Recognition Engine and loads a customizable grammar. Once the grammar is built, the Voice Manager keeps listening to the player sentences and, in the case of a positive match, it queues the command to the next system.
The grammar is prepared to work with different cultures, each of them needs a text file, for example "en-US.txt" with a content similar to this:
MY_HAND = my hand
MY_TABLE = my table
MY_HERO = my hero, me
ENEMY_TABLE = enemy table, his table, her table
ENEMY_HERO = enemy hero, his hero, her hero, face
HERO_POWER = hero power
DISCOVER = discover, choose
DISCOVER_SHOW_HIDE = show, hide
READY = ready
CANCEL = cancel
FULL_ENEMY_FACE = all face, everybody face, full face
MULLIGAN = mulligan, change
MULLIGAN_ACCEPT = confirm mulligan
In a different language, like spanish, the file "es-ES.txt" would contain "MY_HAND = mi mano" etc.
Each line of the file is of the format "keyword = sentence1, sentence2, ..." so the Voice Manager recognices any sentence and maps it to a keyword. As seen, it can be configured so different sentences mean the same keyword. For example, saying "enemy hero" or "face" will both be the keyword ENEMY_HERO and the program will click on enemy hero's portrait.
Zone To Screen Manager:
This system is in charge of mapping the available voice commands into screen coordinates to move the mouse and click on cards. On the one hand, it receives the current commands to process from the Voice Manager, on the other hand, it calculates the current relative screen coordinates based on information provided by the Game Tracker. Once the final pixel coords have been calculated for a specific command, the mouse is moved there and then a click event is raised to the Hearthstone window handler.
Overlay Window:
Using WPF I created a very simple overlay window that enumerates cards and minions so the player doesn't need to count cards. If player hand is full sometimes it's hard to know if a card is the fourth, the fifth...
PROS, CONS AND THOUGHTS
The main positive thing about Voice Harthstone, apart from allowing anyone to play hands-free, is the knowledge growth it has provided me with: analizing and understanding Hearthstone logs, implementing multithreaded processes interacting with 3rd party programs, graphic overlays and multi language speech recognition is not something I do every day.
There are many cons yet, of course: multilanguage is not well tested, maybe very different resolutions don't match screen coordinates properly, speech recognition relies on Windows (which means it sometimes does not work if not trained), Hearthstone needs to be configured to produce the logs, some card effects are not yet tracked (like Jaraxxus replacing your hero and not staying on table) etc, etc.
Why using numbers for the cards instead of the names? I thought about this, but each language has different names, so all that should be mapped either by hand or by a database. Even though accesing a hearthstone database would be nice, numbers should still be used if there are more than one card of the same kind in game, so I decided ging the straight way.
Why not track the whole thing like Hearthstone Deck Tracker or similar apps? The purpose of this program was solely to play using voice commands, other functionalities could be nice to add in the future if I feel like it.
Why not extending a program like Hearthstone Deck Tracker with this feature, as it provides tools for that? This is something I started exploring, but in the end decided to go my own way from scratch. The reason is very simple, and I will use the same words a friend of mine once told me: "If you don't do the whole thing, where's the fun?"