Overview
Trained deep reinforcement learning agents to master word-guessing games with increasing deception complexity. Built custom environments for standard Wordle and the adversarial “Fibble” variant, where the game injects 1–5 intentional lies per round of feedback, directly testing whether an agent can infer truth from corrupted signals. Achieved 99.8% win rate on clean Wordle, with performance degrading sharply as lie count increased — findings that became the empirical core of a co-authored paper published at the IEEE Conference on Games 2025.
GitHub: saideepa05/wordle-and-fibble-using-RL
What I Built
- Custom game environments — parametric Wordle and Fibble environments with configurable lie injection (1–5 deceptions per round), built to mirror the OpenAI Gym interface
- Q-learning agent — implemented with experience replay (standard and prioritized buffers), trained across 9 million episodes with epsilon-decay exploration
- Multi-action representation strategies — experimented with letter-level selections, full-word actions, and hybrid approaches to find optimal state-action encodings
- PPO baseline — implemented Proximal Policy Optimization within an OpenAI Gym environment as an additional RL benchmark
- LLM comparative analysis — benchmarked GPT variants against RL agents, identifying performance gaps in logical reasoning under noisy/contradictory feedback
- Experiment tracking — integrated Weights & Biases dashboards for real-time monitoring of win rates, loss curves, and degradation across Fibble difficulty levels
Tech Stack
Python · Q-learning · PPO · OpenAI Gym · Weights & Biases · NumPy · scikit-learn · GPT API
Results
Misinformation complexity exponentially eroded agent performance: standard Wordle reached 99.8% win rate, Fibble-1 achieved 56.82%, and Fibble-2 through Fibble-5 collapsed to under 1% — with Fibble-5 reaching only 0.29%. These results quantify how RL agents struggle to reconcile contradictory feedback and provided the empirical foundation for the published IEEE paper.