When people hear "AI," they often think of ChatGPT or other generative AI models categorized as LLMs (Large Language Models). However, a new AI paradigm is gaining attention—Agent AI, which I’ll explore in this article.
Before diving into Agent AI, let's first classify AI based on how it is trained and its purpose:
The foundation of RL is built on multiple theories, enabling its real-world applications in autonomous vehicles, aircraft autopilots, and trading agents. The key concepts behind RL include:
For an RL-based trading agent, rewards act as fuel, guiding the AI toward its objective. The most basic reward function is ROI (Return on Investment) since maximizing profit is the primary goal. However, using ROI alone can lead to overfitting, where the agent performs well in training but fails in real-world trading. To prevent overfitting, additional penalties and rewards must be incorporated into the reward function, such as:
Many traders expect AI trading agents to generate explicit buy/sell signals. However, most RL-based trading agents follow a multi-armed bandit approach, meaning they can only take predefined actions like hold, buy, or sell.
To generate buy/sell signals with target prices, we could introduce additional actions, such as:
However, increasing the number of actions requires significantly more data and training steps, making the model more complex. In practice, using an agent for direct auto-trading is often a better approach than relying on discrete trading signals.
For training a trading agent, I recommend starting with technical indicators and time-based features. As you advance, you can incorporate feature engineering, news sentiment analysis, and correlated assets to improve performance.
AI trading agents leverage reinforcement learning to make buy, sell, or hold decisions based on trained data. While they excel at fully automated trading, they aren't typically designed for sending explicit trade signals unless specifically programmed to do so. With the right input features, reward functions, and risk management strategies, AI agents can offer a powerful, automated approach to crypto trading.
This is the story of how I built Statera, my first AI-powered trading agent. But before diving into the journey, let’s first understand what a Statera agent is.
Statera is an AI trading agent designed for spot trading BTCUSDT. Its primary goal is to generate profits exceeding the simple buy-and-hold strategy while managing risks effectively. Sounds simple, right? But in reality, it took me three years to develop a version that consistently generates profits while minimizing losses.
In 2022, I had the idea of using AI to trade crypto instead of relying on manual strategies, which felt no different from gambling. I began studying algorithmic trading and quickly discovered that technical analysis was widely regarded as the key to success.
However, after extensive manual and algorithmic trading, I realized something was wrong. Despite numerous books, research papers, and tutorials claiming that technical analysis works, I couldn’t find a sustainable, repeatable strategy. It took me three years to fully grasp this fundamental issue:
Some indicators work, but most are ineffective. Correlation analysis is essential for identifying useful ones.
Backtesting is crucial, but the length of your dataset determines the reliability of your strategy. Many traders lack sufficient historical data, leading to unreliable models.
Single-timeframe trading is problematic, especially for short timeframes (1m, 5m, 15m). These are highly volatile, requiring longer historical windows or multi-timeframe analysis.
Because of these misconceptions, I wasted enormous time and computing resources training ineffective agents. By the end of 2022, after countless failed experiments, I nearly gave up. I wrongly concluded that technical analysis itself was the problem. Despite experimenting with countless indicators, window sizes, and neural network architectures, nothing worked.
Fast forward to 2024—AI had advanced across industries, from image processing to autonomous vehicles. Inspired by this progress, I revisited my trading agent with fresh insights. I realized my failure wasn’t due to AI itself but because my approach was flawed.
The key issue? Overfitting.
My agent performed exceptionally well on training data but failed in real-world scenarios. Fixing this problem became my primary focus throughout 2024. While I don’t remember the exact sequence of my discoveries, here are the most critical lessons I learned:
Input Sensitivity Matters - Agents are highly sensitive to inputs, making input diversification crucial. However, real-world data limitations exist. For example, a 1-hour timeframe dataset with 100,000 rows covers only 11.41 years—a minimum requirement for training. My initial solution of training across multiple assets failed. Instead, adding Gaussian noise to my limited dataset (50,000 rows) improved generalization.
Neural Network Complexity ≠ Better Performance - Initially, I assumed that increasing model complexity would yield better results. I wasted significant time tweaking hidden layers, node sizes, and hyperparameters. Instead of blindly randomizing parameters, systematic experimentation proved far more effective in finding the optimal balance between underfitting and overfitting.
The Market Has Seasonal Patterns - Asset prices fluctuate based on time. Incorporating time as an input feature significantly improved my agent’s performance.
“Reward per Step” is the Best Performance Indicator - The most effective way to measure model fitness isn’t raw profitability but rather reward per step. Crafting an optimal reward function is a complex topic I’ll cover in a future article.
Hyperparameter Tuning Requires Multi-Stage Training - Standard reinforcement learning (RL) tutorials often use a single set of hyperparameters. However, I found that splitting training into 2–5 phases, with gradual hyperparameter adjustments, yielded better models.
As of March 2025, Statera has been trained on BTCUSDT for approximately 4 million steps—equivalent to 456 years of historical data. The architecture consists of:
Features: Technical indicators + time-based inputs
Model: LSTM for sequential data, followed by a linear model
Training Method: PPO (Proximal Policy Optimization)
After testing the agent with a limited 3-month dataset, here’s what I found:
Profitability: 20–40% gains over 3 months (Oct 2024–Jan 2025)
Market Benchmark: BTCUSDT returned 21% during the same period
Consistency: PPO does not produce deterministic actions but instead outputs probability distributions. Across multiple backtests, Statera consistently outperformed buy-and-hold strategies.
Given these results, I decided to release the agent for further testing under real-market conditions.
My next goal is to develop a flagship model that trades on a 1-minute timeframe using multi-timeframe inputs. The process is ongoing, and I’ll share more insights as I refine the model. Stay tuned for future updates on my blog!
Thanks for reading!
This agent is designed to maximize profit for spot trading while balancing profit and risk. It uses the latest 48 hours of price data and the current state of the agent. Inputs include current time and indicators such as RSI, Stoch, ADX, ATR, OBV, and Bol
When I first observed this pattern, I felt a chill run down my spine—it was hard to believe how accurately the month impacts Bitcoin's price movements. But trust me, data doesn't lie. In this article, I'll walk you through fascinating insights I’ve uncovered through data science.
Crypto News Sentiment (CNS) is a crucial indicator for cryptocurrency traders. It offers real-time insights into the overall sentiment surrounding the crypto market, derived from a comprehensive analysis of news articles.
In this post, I'll delve into:
By understanding CNS, you can gain a competitive edge and make more strategic trades in the dynamic crypto market. To calculate Crypto News Sentiment (CNS), a sophisticated AI bot scours numerous crypto-related news sources. This bot analyzes the content of these articles, extracting sentiment values categorized as positive, negative, or neutral. To provide a comprehensive overview, the individual sentiment values are then aggregated into a single score ranging from -1 (strongly negative) to 1 (strongly positive). This aggregated score represents the prevailing sentiment within the crypto news landscape.
While initial skepticism about the relationship between sentiment analysis and market price was warranted, extensive research and experimentation, including reinforcement learning models, have conclusively demonstrated that sentiment is indeed an independent variable. This means that news sentiment often precedes and influences market price movements, rather than simply reacting to them. This insight underscores the predictive power of CNS as a valuable trading tool.
At the top corner, you'll find two CNS values: a 24-hour value and a 4-hour value. These values represent the sentiment aggregated over the past 24 hours and 4 hours, respectively. This allows for a comparison of short-term and longer-term sentiment trends, providing valuable insights into market volatility and potential price movements
The potential applications of sentiment analysis in the crypto market are vast and exciting. As I continue my research, I'll be sharing additional insights and discoveries on this website. Stay tuned for updates and leverage the power of CNS to make informed trading decisions..