Supervisor: Prof. Luo Ping
Group Member: Deng Jiaqi (3035832490), Gu Zhuangcheng (3035827110), Xie Changhe (3035770575), Zhou Zihan (3035772640)
Introduction
In today’s world, many of us rely on handwritten notes, whether on paper or digital devices like iPads. However, interpreting these notes can be challenging. Although there are some existing note-taking applications, most of them focus on basic functionalities like text transcription and keyword search without effectively handling diverse inputs such as handwritten notes, sketches, and complex visual data. Additionally, they lack context-aware querying and intelligent content categorization, making it difficult for users to organize and interact with their notes, especially when dealing with a combination of text, diagrams, and annotations.
Therefore, we would like to introduce an innovative note assistant application that combines the strengths of general-purpose and specialized LLMs to offer a more interactive and structured note-management experience. This project presents a tool to transform the note-taking process through handwriting recognition, sketch conversion, and note question-answering (QA) capabilities. Utilizing large language models (LLM) and optical character recognition (OCR), the application converts handwritten drafts into organized, searchable, and insightful digital notes. It transforms rough sketches into clean formats like Markdown and enables users to query their notes for deeper insights.
Objective
This innovative note assistant application enhances the efficiency and effectiveness of organizing and understanding notes for users.
Objective 1: Written Content Recognition
The project introduces a Vision Language Model (VLM) for detecting and recognizing user-written content, including both text and graphical elements, enabling the conversion of handwritten drafts and notes into neatly structured digital formats.
Objective 2: Document QA
This project presents a Large Language Model (LLM), fine-tuned for DocQA tasks in lecture note-taking use cases, facilitating document understanding and allowing users to query their notes for detailed insights and information, utilizing techniques like Retrieval-Augmented Generation (RAG).
Objective 3: Interactive Application
This project offers an interactive mobile/desktop application that seamlessly integrates OCR, and QA capabilities, transforming handwritten content into clean, organized, searchable, and easy-to-understand digital notes.
Project Schedule
Phase | Period | Deliverables & Milestones | Status |
---|---|---|---|
0 | Aug – Sep 2024 | Research & Detailed Project Plan Phase 0 Deliverables: Detailed Project Plan & Project Website Setup | Done |
1 | Oct – Nov 2024 | – Data Collection – Fine-tune the Vision Language Model (VLM) | Doing |
1 | Dec – Dec 2024 | – Fine-tune the Large Language Model (LLM) for DocQA task Phase 1 Deliverables: Interim Report & First Presentation | Todo |
2 | Jan – Feb 2025 | – Develop Application | Todo |
2 | Mar – Apr 2025 | – Integrate VLM and LLM into Application Phase 2 Deliverables: Application | Todo |
3 | May – May 2025 | – Conduct User Experience Survey – Test and Refine the System Phase 3 Deliverables: Final Report & Final Presentation | Todo |