Exploring BTC Momentum Misclassifications: A Retrieval-Augmented Generation (RAG) Approach
SUPERVISOR: Yiu Siu Ming
Group: Ezen CHONG, Gillian Ru Qu TOO, Keith Kai Xuan CHAN, Zi Zheng FONG HU

Final Presentation
PROJECT INTRODUCTION
Can we better understand how narratives affect cryptocurrencies?
The primary objective of this project is to develop and evaluate a hybrid sentiment analysis framework that combines traditional ML classification with RAG-based interpretation to predict daily Bitcoin price momentum, targeting a classification accuracy of at least 65% on out-of-sample data, motivated by addressing a key limitation in current sentiment analysis approaches – the lack of narrative context in prediction models.
Project Schedule
Planning Stage
Sept 2024
- Project Brainstorming
- Detailed Project Plan and Initial Setup of Project Webpage
Data Preprocessing
November 2024
- Initiate data collection
- Utilize open-sourced models like LLAMA
- import models locally via techniques such as QLORA
Training the Model
January 2025
- Sentiment Analysis Model Development
- Machine Learning Model Training
- Correlation and Prediction Model Development
- Preliminary Implementation and Interim Report
Refinement and Optimization
March 2025
- Optimize models based on evaluation results
- Perform extensive testing with a larger dataset
- Analyze model robustness and reliability
- Finalised Implementation
Concluding the Project
April 2025
- Comprehensive Evaluation
- Final Report and Documentation
Deliverables
Project Plan
Interim Report
Final Report
Methodology

Our methodology is organized into four main sections: Architecture & Data Collection, Data Preprocessing, Machine Learning Models, and RAG Framework.
Architecture & Data Collection
The overall model architecture is designed to integrate multiple data sources to capture both technical signals and market sentiment to predict Bitcoin price momentum. The three primary data sources are
- Historical Bitcoin Price Data from Yahoo Finance
- Sentiment Data from Cryptocurrency and Market News through the News API
- Fear and Greed Index Data through their API
Data Preprocessing
A robust preprocessing step is essential to ensure that both sentiment data and price data are harmonized, which allows for an integrated analysis in later parts of our modeling and evaluation. After processing the merged dataset is separated into 4 distinct sentiment sets:
- With BTC Sentiment: This feature set incorporated sentiment scores derived from crypto-specific news related exclusively to Bitcoin.
- With Market Sentiment: This feature set incorporated sentiment scores derived from market news.
- With F&G Sentiment (Alternative Crypto Fear & Greed Index): In this variation, the sentiment proxy was the Fear & Greed Index value.
- No Sentiment Indicators: This baseline feature set consisted solely of technical price data (e.g., Open, High, Low, Close, and Volume) without any sentiment information.
Machine Learning Models
Each model will be trained on the standardized and pre-processed data as mentioned in the above section, with varying sentiment datasets as well as a common target variable, which is defined as the next day’s directional movement (upwards/downwards), which is determined from the daily log returns of BTC’s price. This experimental design allows us to assess the incremental value of sentiment indicators on our prediction task and to further handle refining and understanding on a deeper level as to which data sets perform better and why the results are as such. The chosen models include
Our chosen models include logistic regression, decision trees, gradient boosting, naïve Bayes, support vector machines (SVM), neural networks, and random forests (with XGBoost serving as a benchmark). For evaluation, accuracy was used, given that our dataset was fairly balanced.
RAG Framework

The system first uses machine learning to predict cryptocurrency prices. When the model makes errors, the system identifies the dates of those errors and retrieves aggregated news from those dates. This news is then fed into a RAG model, which is prompted to explain how the news might have influenced prices in the opposite direction of the prediction. Traders then review the RAG output to identify relevant narratives, implicitly refining their understanding of market dynamics and improving future trading decisions.