Developing a Multi-Channel Hybrid Model for Indoor Scene Recognition Using Text-Based Classifiers and Convolutional Neural Networks

Aslan, Cengiz

Developing a Multi-Channel Hybrid Model for Indoor Scene Recognition Using Text-Based Classifiers and Convolutional Neural Networks

Date

2025

Authors

Aslan, Cengiz

Abstract

İç mekân sahne tanıma, kapalı alanlardaki farklı ortamları (ofis, kütüphane, mutfak, restoran gibi) tanımlamak için kullanılan bir bilgisayarlı görü problemidir. Robotik, güvenlik, engelli bireylere yardım gibi uygulamalarda mekânı kategorize ederek ortama dair bağlamsal bilgi sağlaması açısından kapsamlı ve güncel bir araştırma alanıdır. Birçok bilgisayarlı görü probleminde olduğu gibi iç mekân sahne tanımada da çoğunlukla evrişimli sinir ağları kullanılmaktadır. Evrişimli sinir ağları (CNN), dış mekân sahne tanımada görselin (örneğin dağ, deniz veya gökyüzünün genel hatları gibi) genel özelliklerine kolayca odaklanarak nispeten daha başarılı iken; iç mekân sahne tanımada görselin yerel özelliklerine (mobilyalar, objeler, çeşitli nesneler gibi ayrıntılara) odaklanmada aynı yüksek başarıyı gösterememektedir. 'MIT 67 Indoor Scene' veri setinin kullanıldığı bu çalışmada önerilen iki kanallı hibrit modelde, evrişimli sinir ağları modelinden gelen özellikler ile nesne tanıma kelimeleri kullanılarak geliştirilen metin tabanlı modelden gelen özellikler birleştirilip eğitilmektedir. Doğal dil işleme ve görüntü işleme teknikleri bir arada kullanılarak geliştirilen bu hibrit model ile görüntü işleme modelinin test başarısı %9 arttırılarak yüksek bir başarı oranı elde edilmiştir.
Indoor scene recognition is a computer vision problem used to identify different environments in enclosed spaces, such as offices, libraries, kitchens, and restaurants. It is an extensive and evolving research area, particularly in applications such as robotics, security, and assistive technologies for individuals with disabilities, where categorizing spaces provides contextual information about the environment. As in many computer vision problems, convolutional neural networks (CNNs) are predominantly used for indoor scene recognition. While CNNs achieve relatively high success in outdoor scene recognition by focusing on global features of an image (e.g., general outlines of mountains, seas, or skies), they do not perform as effectively in indoor scene recognition, where the emphasis is on local features such as furniture, objects, and various items. In this study, which utilizes the MIT 67 Indoor Scene dataset, a two-channel hybrid model is proposed. The model combines features extracted from a CNN with those obtained from a text-based model developed using object recognition words. By integrating natural language processing (NLP) and image processing techniques, the hybrid model improves the test accuracy of the image processing model by 9%, achieving a high-performance rate.

Keywords

Bilgisayar Mühendisliği Bilimleri-Bilgisayar Ve Kontrol, Computer Engineering And Computer Science And Control

End Page

75

URI

https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=5NNqZKwwGohPh6_KCcfp-hL_gPrvwdxG4jDw2tsXkNG1aqMH6GGBU4D1rnqfLRPt
https://hdl.handle.net/20.500.14720/28196

Collections

Master Tezleri

Full item page

Developing a Multi-Channel Hybrid Model for Indoor Scene Recognition Using Text-Based Classifiers and Convolutional Neural Networks

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Turkish CoHE Thesis Center URL

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections