FYP24088

Supervisor: Prof. Luo Ping

Group Member: Deng Jiaqi (3035832490), Gu Zhuangcheng (3035827110), Xie Changhe (3035770575), Zhou Zihan (3035772640)

Introduction

In today’s world, many of us rely on handwritten notes, whether on paper or digital devices like iPads. However, interpreting these notes can be challenging. Although there are some existing note-taking applications, most of them focus on basic functionalities like text transcription and keyword search without effectively handling diverse inputs such as handwritten notes, sketches, and complex visual data. Additionally, they lack context-aware querying and intelligent content categorization, making it difficult for users to organize and interact with their notes, especially when dealing with a combination of text, diagrams, and annotations.

Therefore, we would like to introduce an innovative note assistant application that combines the strengths of general-purpose and specialized LLMs to offer a more interactive and structured note-management experience. This project presents a tool to transform the note-taking process through handwriting recognition, sketch conversion, and note question-answering (QA) capabilities. Utilizing large language models (LLM) and optical character recognition (OCR), the application converts handwritten drafts into organized, searchable, and insightful digital notes. It transforms rough sketches into clean formats like Markdown and enables users to query their notes for deeper insights.

Objective

This innovative note assistant application enhances the efficiency and effectiveness of organizing and understanding notes for users.

Objective 1: Written Content Recognition

The project introduces a Vision Language Model (VLM) for detecting and recognizing user-written content, including both text and graphical elements, enabling the conversion of handwritten drafts and notes into neatly structured digital formats.

Objective 2: Document QA

This project presents a Large Language Model (LLM), fine-tuned for DocQA tasks in lecture note-taking use cases, facilitating document understanding and allowing users to query their notes for detailed insights and information, utilizing techniques like Retrieval-Augmented Generation (RAG).

Objective 3: Interactive Application

This project offers an interactive mobile/desktop application that seamlessly integrates OCR, and QA capabilities, transforming handwritten content into clean, organized, searchable, and easy-to-understand digital notes.

Project Schedule

Phase	Period	Deliverables & Milestones	Status
0	Aug – Sep 2024	Research & Detailed Project Plan Phase 0 Deliverables: Detailed Project Plan & Project Website Setup	Done
1	Oct – Nov 2024	– Data Collection – Fine-tune the Vision Language Model (VLM)	Doing
1	Dec – Dec 2024	– Fine-tune the Large Language Model (LLM) for DocQA task Phase 1 Deliverables: Interim Report & First Presentation	Todo
2	Jan – Feb 2025	– Develop Application	Todo
2	Mar – Apr 2025	– Integrate VLM and LLM into Application Phase 2 Deliverables: Application	Todo
3	May – May 2025	– Conduct User Experience Survey – Test and Refine the System Phase 3 Deliverables: Final Report & Final Presentation	Todo

An Enhanced Note Assistance Application with Handwriting Recognition Leveraging LLM

Introduction

Objective

Objective 1: Written Content Recognition

Objective 2: Document QA

Objective 3: Interactive Application