Developing a Multi-Channel Hybrid Model for Indoor Scene Recognition Using Text-Based Classifiers and Convolutional Neural Networks
Abstract
İç mekân sahne tanıma, kapalı alanlardaki farklı ortamları (ofis, kütüphane, mutfak, restoran gibi) tanımlamak için kullanılan bir bilgisayarlı görü problemidir. Robotik, güvenlik, engelli bireylere yardım gibi uygulamalarda mekânı kategorize ederek ortama dair bağlamsal bilgi sağlaması açısından kapsamlı ve güncel bir araştırma alanıdır. Birçok bilgisayarlı görü probleminde olduğu gibi iç mekân sahne tanımada da çoğunlukla evrişimli sinir ağları kullanılmaktadır. Evrişimli sinir ağları (CNN), dış mekân sahne tanımada görselin (örneğin dağ, deniz veya gökyüzünün genel hatları gibi) genel özelliklerine kolayca odaklanarak nispeten daha başarılı iken; iç mekân sahne tanımada görselin yerel özelliklerine (mobilyalar, objeler, çeşitli nesneler gibi ayrıntılara) odaklanmada aynı yüksek başarıyı gösterememektedir. 'MIT 67 Indoor Scene' veri setinin kullanıldığı bu çalışmada önerilen iki kanallı hibrit modelde, evrişimli sinir ağları modelinden gelen özellikler ile nesne tanıma kelimeleri kullanılarak geliştirilen metin tabanlı modelden gelen özellikler birleştirilip eğitilmektedir. Doğal dil işleme ve görüntü işleme teknikleri bir arada kullanılarak geliştirilen bu hibrit model ile görüntü işleme modelinin test başarısı %9 arttırılarak yüksek bir başarı oranı elde edilmiştir.
Indoor scene recognition is a computer vision problem used to identify different environments in enclosed spaces, such as offices, libraries, kitchens, and restaurants. It is an extensive and evolving research area, particularly in applications such as robotics, security, and assistive technologies for individuals with disabilities, where categorizing spaces provides contextual information about the environment. As in many computer vision problems, convolutional neural networks (CNNs) are predominantly used for indoor scene recognition. While CNNs achieve relatively high success in outdoor scene recognition by focusing on global features of an image (e.g., general outlines of mountains, seas, or skies), they do not perform as effectively in indoor scene recognition, where the emphasis is on local features such as furniture, objects, and various items. In this study, which utilizes the MIT 67 Indoor Scene dataset, a two-channel hybrid model is proposed. The model combines features extracted from a CNN with those obtained from a text-based model developed using object recognition words. By integrating natural language processing (NLP) and image processing techniques, the hybrid model improves the test accuracy of the image processing model by 9%, achieving a high-performance rate.
Indoor scene recognition is a computer vision problem used to identify different environments in enclosed spaces, such as offices, libraries, kitchens, and restaurants. It is an extensive and evolving research area, particularly in applications such as robotics, security, and assistive technologies for individuals with disabilities, where categorizing spaces provides contextual information about the environment. As in many computer vision problems, convolutional neural networks (CNNs) are predominantly used for indoor scene recognition. While CNNs achieve relatively high success in outdoor scene recognition by focusing on global features of an image (e.g., general outlines of mountains, seas, or skies), they do not perform as effectively in indoor scene recognition, where the emphasis is on local features such as furniture, objects, and various items. In this study, which utilizes the MIT 67 Indoor Scene dataset, a two-channel hybrid model is proposed. The model combines features extracted from a CNN with those obtained from a text-based model developed using object recognition words. By integrating natural language processing (NLP) and image processing techniques, the hybrid model improves the test accuracy of the image processing model by 9%, achieving a high-performance rate.
Description
Keywords
Bilgisayar Mühendisliği Bilimleri-Bilgisayar Ve Kontrol, Computer Engineering And Computer Science And Control
Turkish CoHE Thesis Center URL
WoS Q
Scopus Q
Source
Volume
Issue
Start Page
End Page
75