Kina, Erol2025-12-302025-12-3020252376-599210.7717/peerj-cs.3338https://doi.org/10.7717/peerj-cs.3338https://hdl.handle.net/20.500.14720/29315Food allergies are a significant public health concern, emphasizing the need for precise and comprehensive allergen identification in food products. Despite the critical importance of allergen detection, existing allergen food datasets and detection approaches exhibit several limitations. These include small dataset sizes and low accuracy, particularly in real-time scenarios. To address these challenges, this study proposes a novel machine learning-based system evaluated in both real-time and offline environments. The proposed system is designed to analyze ingredient lists extracted from scanned product labels. By leveraging Optical Character Recognition (OCR) technology, the system efficiently retrieves ingredient information in real-time, enabling accurate identification of allergenic components. Once the ingredient information is extracted using OCR, feature extraction techniques such as Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and Global Vectors for Word Representation (GloVe) are applied. These features play a critical role in training various machine learning and deep learning models. Among the tested models, Logistic Regression (LR) outperformed others, achieving an impressive accuracy of 0.99 with a low computational cost of 13 milliseconds in offline testing. In real-time testing, where product images are captured and processed through the pipeline, the system demonstrated robust performance with a 0.90 accuracy score.eninfo:eu-repo/semantics/openAccessFood AllergyFood ReactionMachine LearningOcrFeature ExtractionReal-Time Food Allergen Detection Using OCR-Enhanced Machine Learning TechniquesArticle11Q2Q1WOS:001639365900001