This is a complete machine learning pipeline that trains an XGBoost model to predict BTC's direction in 5-minute windows. Unlike earlier tutorials that used technical indicators (MACD, RSI), this approach feeds the model raw candle shapes - the actual body size, wick length, and volume of each 1-minute candle - and lets the AI figure out what matters.
XGBoost (eXtreme Gradient Boosting) is one of the most popular machine learning algorithms for structured data. It builds an ensemble of decision trees that learn from each other's mistakes.
Instead of pre-computing MACD, RSI, or Bollinger Bands, this approach uses the raw shape of each candle:
15 candles x 4 features = 60 raw features, plus aggregate stats = ~75 total features.
| Step | Function | What Happens |
|---|---|---|
| 1 | load_raw_data() | Load 1-minute BTC/USD OHLCV data from CSV |
| 2 | build_features() | Extract 75 raw candle features per 5-min window |
| 3 | time_series_split() | Split into train (70%) / validation (15%) / test (15%) |
| 4 | train_model() | Train XGBoost classifier with early stopping |
| 5 | evaluate_model() | Test accuracy, P&L simulation, feature importance |
| 6 | save_model() | Save trained model to disk for later use |
def build_features(df):
# Pre-compute raw candle metrics for every 1-min candle
body_pct = (closes - opens) / opens * 100
upper_wick_pct = (highs - max_opens_closes) / opens * 100
lower_wick_pct = (min_opens_closes - lows) / opens * 100
vol_ratio = volumes / rolling_mean_volume
is_green = (closes > opens).astype(float)
# For each 5-minute window...
for w in range(n_windows):
# Label: 1=UP if last close >= first open
label = 1 if last_close >= first_open else 0
# Look back at 15 candles before this window
for i in range(1, LOOKBACK + 1):
row[f"candle_{i}_body"] = body_pct[idx]
row[f"candle_{i}_upper_wick"] = upper_wick_pct[idx]
row[f"candle_{i}_lower_wick"] = lower_wick_pct[idx]
row[f"candle_{i}_vol"] = vol_ratio[idx]
# Add aggregate features...
# Green count, avg body size, wick ratio, streaks, etc.
What it does: This is the feature engineering step - the most important part of any ML project. It looks at the 15 one-minute candles before each 5-minute window and extracts 4 features per candle. Then it adds aggregate statistics (green candle counts, average sizes, streaks, etc.).
model = xgb.XGBClassifier(
n_estimators=1000, # Up to 1000 trees
max_depth=4, # Shallow trees (prevent overfitting)
learning_rate=0.01, # Small steps for better generalization
subsample=0.7, # Use 70% of data per tree
colsample_bytree=0.5, # Use 50% of features per tree
min_child_weight=100, # Minimum samples per leaf
early_stopping_rounds=100, # Stop if no improvement for 100 rounds
)
model.fit(X_train, y_train,
eval_set=[(X_train, y_train), (X_val, y_val)])
What it does: Trains an XGBoost model with careful hyperparameters designed to prevent overfitting. Early stopping is key - the model monitors its performance on the validation set and stops training when it stops improving.
The evaluation produces several key outputs:
| <- Upper Wick (price went up but got rejected)
|
---|--- <- Open (or Close, whichever is higher for green)
| |
| Body | <- The "real" move: Open to Close
| |
---|--- <- Close (or Open, whichever is lower for green)
|
| <- Lower Wick (price dipped but buyers stepped in)
Green candle (bullish): Close > Open. Body shows upward move.
Red candle (bearish): Close < Open. Body shows downward move.
Long upper wick: Sellers pushed price down from the high = selling pressure
Long lower wick: Buyers pushed price up from the low = buying pressure
pip install xgboost scikit-learn scipy pandas numpy termcolor
| Term | Meaning |
|---|---|
| XGBoost | eXtreme Gradient Boosting - a powerful ML algorithm for tabular data |
| Feature Engineering | Creating useful inputs for the model from raw data |
| Decision Tree | A model that makes predictions by following a series of yes/no questions |
| Overfitting | When a model memorizes training data but fails on new data |
| Early Stopping | Stopping training when validation performance stops improving |
| Validation Set | Data held out during training to check for overfitting |
| Feature Importance | Which input features the model relied on most |
| Confidence Threshold | Only trading when the model's prediction probability is high enough |
| Classification Report | Detailed accuracy metrics per class (UP vs DOWN) |
| Confusion Matrix | A table showing actual vs predicted classifications |
# --- PYTHON ---
def build_features(df):
body_pct = (closes - opens) / opens * 100
upper_wick_pct = (highs - max_oc) / opens * 100
lower_wick_pct = (min_oc - lows) / opens * 100
vol_ratio = volumes / rolling_mean_volume
is_green = (closes > opens).astype(float)
for w in range(n_windows):
label = 1 if last_close >= first_open else 0
for i in range(1, LOOKBACK + 1):
row[f"candle_{i}_body"] = body_pct[idx]
row[f"candle_{i}_upper_wick"] = upper_wick_pct[idx]
row[f"candle_{i}_lower_wick"] = lower_wick_pct[idx]
row[f"candle_{i}_vol"] = vol_ratio[idx]
row["green_count_15"] = is_green[lb_slice].sum()
row["avg_body_size_15"] = abs_body_15.mean()
row["consecutive_same"] = streak
row["return_skew_30"] = skew(ret_slice)
row["hour"] = dt.hour
row["session"] = 0/1/2 # Asia/Europe/US
# --- PSEUDO-CODE ---
FUNCTION build_features(dataframe):
PRE-COMPUTE for every 1-minute candle:
body_pct = how big the candle body is (open to close) as a percentage
upper_wick_pct = how far above the body the price reached (selling pressure)
lower_wick_pct = how far below the body the price dipped (buying pressure)
vol_ratio = current volume divided by 30-candle average volume
is_green = 1 if close > open (bullish candle), 0 if red (bearish)
FOR every group of 5 consecutive candles (a 5-minute window):
DETERMINE the label:
1 (UP) if the last close price >= first open price
0 (DOWN) if the last close price < first open price
LOOK BACK at the 15 candles BEFORE this window:
FOR each of the 15 previous candles:
RECORD its body size percentage
RECORD its upper wick size percentage
RECORD its lower wick size percentage
RECORD its volume ratio (is volume unusual?)
COMPUTE aggregate statistics:
How many of the last 15 candles were green (bullish)?
What's the average body size over 15 candles?
What's the ratio of upper wicks to lower wicks (buy vs sell pressure)?
Are candle bodies getting bigger or smaller recently?
What's the longest streak of consecutive same-direction candles?
What's the skewness of recent returns? (asymmetric distribution?)
What hour of the day is it?
What trading session? (Asia=0, Europe=1, US=2)
# --- PYTHON ---
def time_series_split(df):
n = len(df)
train_end = int(n * 0.70)
val_end = int(n * 0.85)
train = df.iloc[:train_end].copy()
val = df.iloc[train_end:val_end].copy()
test = df.iloc[val_end:].copy()
return train, val, test
# --- PSEUDO-CODE ---
FUNCTION time_series_split(dataframe):
COUNT total rows
CALCULATE split points:
First 70% = TRAINING data (the model learns from this)
Next 15% = VALIDATION data (to check during training for overfitting)
Last 15% = TEST data (final evaluation, never seen by model)
IMPORTANT: DO NOT shuffle! Time must flow forward.
Training = oldest data
Validation = middle data
Test = newest data
RETURN the three split datasets
# --- PYTHON ---
def train_model(train, val, feature_cols):
model = xgb.XGBClassifier(
n_estimators=1000, max_depth=4, learning_rate=0.01,
subsample=0.7, colsample_bytree=0.5, min_child_weight=100,
early_stopping_rounds=100)
model.fit(X_train, y_train,
eval_set=[(X_train, y_train), (X_val, y_val)])
return model
# --- PSEUDO-CODE ---
FUNCTION train_model(training data, validation data, feature columns):
CREATE an XGBoost classifier with these settings:
n_estimators=1000: build up to 1000 decision trees
max_depth=4: each tree can only be 4 levels deep (prevents overfitting)
learning_rate=0.01: each tree contributes only 1% to the final answer
subsample=0.7: each tree only sees 70% of the data (prevents overfitting)
colsample_bytree=0.5: each tree only sees 50% of features (prevents overfitting)
min_child_weight=100: each leaf needs at least 100 samples (prevents overfitting)
early_stopping=100: if no improvement for 100 rounds, stop training
TRAIN the model:
Feed it training data (features + labels)
After each tree, CHECK performance on validation data
KEEP the best version of the model
RETURN the trained model
# --- PYTHON ---
def evaluate_model(model, train, val, test, feature_cols):
for name, split in [("TRAIN", train), ("VAL", val), ("TEST", test)]:
preds = model.predict(X)
acc = accuracy_score(y, preds)
report = classification_report(y_test, preds)
cm = confusion_matrix(y_test, preds)
correct = (preds == y)
win_rate = correct.sum() / len(preds) * 100
pnl_per_trade = np.where(correct, WIN_PROFIT, -LOSS_AMOUNT)
cumulative_pnl = np.cumsum(pnl_per_trade)
importance = model.get_booster().get_score(importance_type="gain")
proba = model.predict_proba(X_test)
for thresh in [0.55, 0.60, 0.65, 0.70]:
mask = max_proba >= thresh
# ... filter and recalculate
# --- PSEUDO-CODE ---
FUNCTION evaluate_model(model, train, val, test, features):
STEP 1: Measure accuracy on each dataset:
ASK model to predict UP/DOWN for training data -> check accuracy
ASK model to predict UP/DOWN for validation data -> check accuracy
ASK model to predict UP/DOWN for test data -> check accuracy
(Test accuracy is the most important - it's never-seen-before data)
STEP 2: Print classification report:
For both UP and DOWN predictions:
Precision: when it says UP, how often is it right?
Recall: of all actual UPs, how many did it catch?
F1-score: balance of precision and recall
STEP 3: Show confusion matrix:
2x2 table: "Predicted UP vs DOWN" x "Actual UP vs DOWN"
STEP 4: Simulate Polymarket P&L:
FOR each test prediction:
IF correct: ADD $8.52 profit
IF wrong: SUBTRACT $10.00 loss
CALCULATE cumulative P&L over time
CALCULATE maximum drawdown (worst peak-to-trough decline)
STEP 5: Feature importance:
RANK all 75 features by how much they contributed to predictions
SHOW top 20
STEP 6: Confidence threshold analysis:
FOR each threshold (55%, 60%, 65%, 70%):
ONLY count predictions where the model is this confident
CHECK: does filtering for high confidence improve win rate?
TRADE-OFF: higher threshold = fewer trades but better quality
STEP 7: Time breakdown:
WIN RATE by trading session (Asia / Europe / US)
WIN RATE by hour of the day