An Enhanced Note Assistance Application with Handwriting Recognition Leveraging LLM


Supervisor: Prof. Luo Ping

Group Member: Deng Jiaqi (3035832490), Gu Zhuangcheng (3035827110), Xie Changhe (3035770575), Zhou Zihan (3035772640)

Introduction

In today’s world, many of us rely on handwritten notes, whether on paper or digital devices like iPads. However, interpreting these notes can be challenging. Although there are some existing note-taking applications, most of them focus on basic functionalities like text transcription and keyword search without effectively handling diverse inputs such as handwritten notes, sketches, and complex visual data. Additionally, they lack context-aware querying and intelligent content categorization, making it difficult for users to organize and interact with their notes, especially when dealing with a combination of text, diagrams, and annotations.

Therefore, we would like to introduce an innovative note assistant application that combines the strengths of general-purpose and specialized LLMs to offer a more interactive and structured note-management experience. This project presents a tool to transform the note-taking process through handwriting recognition, sketch conversion, and note question-answering (QA) capabilities. Utilizing large language models (LLM) and optical character recognition (OCR), the application converts handwritten drafts into organized, searchable, and insightful digital notes. It transforms rough sketches into clean formats like Markdown and enables users to query their notes for deeper insights. 

Objective

This innovative note assistant application enhances the efficiency and effectiveness of organizing and understanding notes for users.

Objective 1: Written Content Recognition

The project introduces a Vision Language Model (VLM) for detecting and recognizing user-written content, including both text and graphical elements, enabling the conversion of handwritten drafts and notes into neatly structured digital formats.

Objective 2: Document QA

This project presents a Large Language Model (LLM), fine-tuned for DocQA tasks in lecture note-taking use cases, facilitating document understanding and allowing users to query their notes for detailed insights and information, utilizing techniques like Retrieval-Augmented Generation (RAG).

Objective 3: Interactive Application

This project offers an interactive mobile/desktop application that seamlessly integrates OCR, and QA capabilities, transforming handwritten content into clean, organized, searchable, and easy-to-understand digital notes.

Project Schedule

PhasePeriodDeliverables & MilestonesStatus
0Aug – Sep 2024Research & Detailed Project Plan
Phase 0 Deliverables: Detailed Project Plan & Project Website Setup
Done
1Oct – Nov 2024– Data Collection
– Fine-tune the Vision Language Model (VLM)
Doing
1Dec – Dec 2024– Fine-tune the Large Language Model (LLM) for DocQA task
Phase 1 Deliverables: Interim Report & First Presentation
Todo
2Jan – Feb 2025– Develop ApplicationTodo
2Mar – Apr 2025– Integrate VLM and LLM into Application
Phase 2 Deliverables: Application
Todo
3May – May 2025– Conduct User Experience Survey
– Test and Refine the System
Phase 3 Deliverables: Final Report & Final Presentation
Todo