Predicting Competitive Pokémon VGC Leads Using Latent Semantic Analysis

4 months ago 31

Bruno Luvizotto Carli

Universidade Federal do Paraná, Curitiba, PR, Brazil.

brunolcarli (at) gmail (dot) com

Competitive Pokémon battles often hinge on the initial selection of Pokémon leads. Anticipating an opponent’s lead choice offers a tactical edge, particularly in high-stakes matches. In this paper, I explore the use of Latent Semantic Analysis (LSA), a natural language processing algorithm, applied to over 5,000 Pokémon Showdown battle logs, to predict likely lead pairs based on team compositions. Evaluated against the Top 8 bracket of the North America International Championships (NAIC) 2025, the model achieved promising results, showcasing the potential of unsupervised learning in strategic game prediction.

POKÉMON VGC

The Video Game Championships (VGC) are the official competitive format organized by The Pokémon International and Play! Pokémon organizations. These events culminate in the annual Pokémon World Championships, which bring together top players from around the globe in various formats including the video games, the trading card game, and select spin-off titles.

In VGC, battles are conducted in a Double Battle format where each player brings six Pokémon, but selects only four to battle. Two are sent out as leads, and two remain in the back for switching. The game begins with a team preview phase, during which players see their opponent’s six Pokémon and their move sets and held items (Bulbapedia, 2025).

LATENT SEMANTIC ANALYSIS

Latent Semantic Analysis (LSA) is a technique from natural language processing used to discover relationships between documents and terms by mapping them into a lower-dimensional space via Singular Value Decomposition (SVD). In this transformed space, documents can be compared based on the semantic similarity of their content. Originally developed for text retrieval and indexing, LSA (or LSI when used in information retrieval contexts) works by converting documents into a term-document matrix and reducing it to capture the most important relationships (Foltz, 1996; Wikipedia, 2025).

WHY IS A GOOD LEAD SELECTION RELEVANT IN VGC?

This question defines the core intention of this study. The fact is that all matches start in the team preview phase where each player has to decide which Pokémon will participate in the battle, leaving two of them out of the battle. This is an important decision because the four selected Pokémon must be solid and concise to overwhelm the opponent’s team. Selecting a good pair of Pokémon with synergy will put pressure on the opponent side of the board, if the player makes the right call on the opponent’s leads and selects two Pokémon that are able to counter and check them, thus gaining some advantage on the match.

As well stated by (Zheng, 2020), “picking the proper Pokémon in team preview can give you a major advantage before the game even starts”, so the team preview holds a decisive part in the match. It is one of the most difficult parts of the game and there’s no easy answer or cookbook to a perfect choice that will always work; not even the best players get it right 100% of the time.

Aaron Zheng emphasized that an effective lead not only applies early pressure but also aligns with the broader strategic intent of the team. Through mental flowcharts and recognition of synergy-based combinations (e.g., Speed control + Attacker, Redirection + Setup), players can anticipate the likely opposing leads and respond with countermeasures that tilt the battle in their favor. Also, the attempt to anticipate the opponent’s strategy (particularly their most threatening combinations) forms the basis of lead prediction.

This capacity to predict an opponent’s decisions is foundational for algorithmic approaches aimed at narrowing viable options and improving the player’s lead selection. Leads are not merely chosen in isolation, but evaluated in the context of synergy, threat coverage, and their role in the broader battle plan (Zheng, 2020).

The predictive paradigm is reenforced by Traylor (apud Zheng, 2020) as an important part of the game, thus supporting the idea of proposing a narrative-based framework, in which players simulate possible match leads during team preview.

METHODOLOGY

Data collection

Data was sourced from publicly available battle logs hosted in an online battle simulator used extensively by the competitive community to test and train teams. The replay logs can be accessed via a URL trick: appending “.log” to the end of a replay page reveals the raw text file containing detailed information about each battle. This includes: (1) the complete teams for each player (six Pokémon each); (2) the two initial Pokémon (leads) sent out at the start of battle (Pokémon Showdown, 2025).

A scraping script was written in Python using the requests library to collect over 5,000 logs to extract team and lead data (available in Carli, 2025a).

Data filtering

To match a real-world competitive context, logs were filtered to include only those where at least one team used six Pokémon from the set of species seen in the NAIC 2025 Masters Top 8 bracket (Figs. 1, 2). This produced a refined dataset of 1,174 battle logs. While Figure 3 lists the top 30 most frequent Pokémon in the train set, all the Pokémon frequencies from the teams used by top cut players are listed in Figure 4 (data from LabMaus, 2025).

Model input format

As stated by Zheng (2020), the team preview is heavily influenced by the team each player is using. So, in order to find a pattern of the selected lead a structure containing both teams in the match must be provided so each battle instance is represented as a string: “poke1 poke2 poke3 poke4 poke5 poke6 VS poke1 poke2 poke3 poke4 poke5 poke6”.

Here, the left side of the ‘VS’ represents the user’s team and the right side represents the opponent. The model is trained to predict the opponent’s most likely two-Pokémon lead.

Implementation

The model was implemented using Gensim’s LSI model with a Bag-of-Words representation. Lead prediction was based on vector similarity (cosine distance) between the input battle string and those seen during training.

The model may return a set of predictions by configuring a parameter named max_preds on the wrapper function that invokes the model. This parameter will inform the model how many possibilities of combination the opponent side can select and return a number of predictions equivalent to the value defined as argument to the parameter. Using this approach, it is possible to test the performance based on how many possibilities predicted by the model were necessary to correctly predict the real lead a player has selected, in other words it is expected that with more prediction possibilities the more biased might be the model.

For example, let’s suppose that before a final match of the tournament (the match that will crown a champion) both players are practicing with their teammates at the hotel in order to get prepared for the ultimate decisive battle. If the algorithm was configured to predict three possible combinations of lead based on their teams and battle over and over with his teammates using those predicted leads so the player can get more prepared for what’s to come, is an acceptable threshold for this player to go with. But if the algorithm was configured to predict eight or ten possible combinations, the player will be less likely to accomplish all sorts of possibilities in time. So, evaluation of the model over a range of possibilities can reveal the margin of error the model can perform by predicting less possibilities with higher accuracy.

Evaluation method

Predictions were evaluated by comparing model outputs with real-world leads from the NAIC 2025 Top 8 bracket. Since the leads used by each player are not explicitly shown in the website data, this information was collected by watching the live stream record available on YouTube (via Núñez, 2025).

The author watched the start of every match from the top cut bracket and taken notes of each lead used by the players that was included in the source code validation test set. Figure 5 shows an example of how the author identified the lead of each player on a match to set up the evaluation test set.

There were two possible ways to achieve a quantitative value for evaluation: (1) predicting correctly both lead Pokémon of a player; (2) predicting at least one lead Pokémon of a player.

In order to achieve this, it was defined one metric for each of these possibilities: Hard Prediction: a score of 1 if both predicted leads match the real leads (order-independent) otherwise 0. Soft Prediction: a score of 1 if both predicted leads match the real leads (order-independent) or a score of 0.5 if only one predicted lead matches, otherwise 0.

This approach makes it possible to evaluate the model from two points of view: (1) a perfect prediction that implies on a hard complexity level; (2) a more likely possible prediction that implies flexibility to the analytical context.

The main goal of using the data of a real competition as evaluation metric is to bring the theoretical implication of data science over a real-world problem. This brings more insightful ways to obtain a reliable proof of concept to computational abstract algorithms, which creates a more meaningful solution context for the application. Final performance metrics are the mean average of these values across all tournament matches for a set of parameter range of configurations. The source code and implementation are available online (see Carli, 2025b).

RESULTS

The model’s performance was evaluated on each of the 32 battles from the NAIC Top 8 bracket and tested with the parameter configuration in a range of values from one to ten. Table 1 lists the evaluation of the model on each parameter. It is visible that the higher the number of predictions, the higher the score obtained, implying that the model was able to find a combination that matches with the lead used by the player in the tournament (Fig. 6).

Considering the most likely three possible lead Pokémon the opponent will bring, the overall Hard Prediction (correctly predict both leads) scored 62.50% while the overall Soft Prediction (correctly predict one of the leads) scored 81.25%. As the number of predictions grows, the algorithm is more likely able to find a pattern that matches with the leads used in almost every game, reaching a Hard Prediction score of 90.63% accuracy and Soft Prediction 95.31%.

Table 2 describes the middle term (five predictions) for every match in the NAIC 2025 Top 8 bracket scores: Hard Prediction = 68.75%; Soft Prediction = 84.38%. In the finals, where players have the most time to prepare and study opponent teams, the scores were highly accurate even for the three most likely leads (Table 3), scoring: Hard Prediction = 83.33%; Soft Prediction = 91.66%.

These results suggest that even a simple unsupervised algorithm can provide meaningful insights in a competitive context.

DISCUSSION

While the model does not capture all the nuance of VGC gameplay (e.g., movesets, synergy, in-game momentum), it offers a surprising amount of strategic value simply by analyzing team compositions from previous matches played in simulation games and confirming the existence of a pattern that describes the more likely lead players choose to pick based on the matchup. It can serve as a scouting tool or sparring assistant.

Furthermore, the methodology avoids trying to model the data around the entire dataset from the initial number of logs collected from the data source, instead focusing on a specific set of Pokémon existent in the teams used by the players classified for the top cut resulting in an impactful decision point: the lead selection. This narrow focus keeps the modeling tractable and meaningful, demonstrating capability of pattern recognition on well-defined matchups when predicting a higher number of combinations.

But of course, it’s true that Pokémon VGC is very challenging and complex in many ways, considering the skills and experience of the players, as well as the mind games involved in tricking the opponents with unpredictable choices, which are parameters the algorithm still does not understand. But competitions are overwhelming even for most experienced players and scouting tools like this are here to help the players to start to identifying common plays without stressing too much, especially in the finals where players have more knowledge of their opponents and more time to practiceand prepare.

By grounding the evaluation in a real-world tournament with known outcomes and high stakes, this study demonstrates practical relevance and not just theoretical performance. The application has potential to become a promising tool for preparation before matches in the future.

CONCLUSION

This study presented an innovative application of LSA to the Pokémon VGC context, demonstrating that unsupervised semantic models can support competitive decision-making in eSports. While not state of the art in terms of algorithmic sophistication, the work is novel in its domain adaptation and bridges data science with strategic gameplay.

FURTHER WORK

Future improvements may include supervised refinement, incorporation of moves/item metadata, and broader meta-context generalization. Predicting not only the lead but the four picks for a match is also a good improvement for more complex analysis of game matchups. Another creative and helpful point of view is to predict the most likely Pokémon a player decides not to bring to a game according to its disadvantage. One last useful analytical variance of the model application is to look for strengths and weakness among team compositions in order to optimize the coverage and synergy between the six Pokémon formation in teambuilding process which was well mentioned by Zheng (2020) when pointing that “stronger teams will allow you to have more options during team preview. Bad match-ups will lead to more difficult team preview phases, and you’ll occasionally be in situations where you don’t have any good leads. Selecting a strong team will make team preview easier for you.”

REFERENCES

Bulbapedia. (2025) World Championships. Bulbapedia. Available from: https://bulbapedia.bulbagarden.net/wiki/World_Championships (Date of access: 25/Jun/2025).

Carli, B.L. (2025a) VGC Pokémon Showdown Battle Logs. Kaggle. Available from: https://www.kaggle.com/datasets/brunolcarli/vgc-pokmon-showdown-battle-logs (Date of access: 25/Jun/2025).

Carli, B.L. (2025b) Pokémon VGC Leads Prediction with LSA. Kaggle. Available from: https://www.kaggle.com/code/brunolcarli/pokemon-vgc-leads-prediction-with-lsa/notebook (Date of access: 25/Jun/2025).

Foltz, P. (1996) Latent Semantic Analysis for text-based research. Behavior Research Methods 28: 197–202.

LabMaus. (2025) North America International Championships 2025. LabMaus. Available from: https://labmaus.net/tournaments/30842/8 (Date of access: 25/Jun/2025).

Núñez, A. (2025) 2025 North America International Championships. Victory Road. Available from: https://victoryroad.pro/2025-naic/ (Date of access: 25/Jun/2025).

Pokémon Showdown. (2025) Pokémon Showdown! battle simulator. Pokémon Showdown. Available from: https://pokemonshowdown.com/ (Date of access: 25/Jun/2025).

Zheng, A. (2020) Team Preview. VGC Guide. Available from: https://www.vgcguide.com/team-preview (Date of access: 25/Jun/2025).

Wikipedia. (2025) Latent semantic analysis. Wikipedia. Available from: https://en.wikipedia.org/wiki/Latent_semantic_analysis (Accessed: 25/Jun/2025).

Acknowledgements

ChatGPT (OpenAI) was used to improve the writing style of this article and the. The author reviewed, edited, and revised the ChatGPT-generated texts to his own liking and takes ultimate responsibility for the content of this publication.

About the Author

Bruno L. Carli is a Brazilian software developer with a major bachelor in Software Engineering (Unicesumar, 2020) and postgraduate specialization in Applied Artificial Intelligence (Universidade Federal do Paraná-UFPR, 2023), Pokémon VGC player passionate about video games and technology.

Read Entire Article