Bridging Vision and Understanding: The Central Role of Computer Vision in AI
Keywords:
Artificial Intelligence, Computer Vision , Machine UnderstandingAbstract
Computer vision has become one of the most critical components in the advancement of artificial intelligence, enabling machines not only to perceive but also to interpret the world around them. This paper explores the central role of computer vision in bridging the gap between visual perception and higher-level machine understanding. By integrating deep learning, pattern recognition, and semantic interpretation, computer vision transforms raw visual data into structured knowledge that supports decision-making, reasoning, and autonomous behavior. The discussion highlights recent progress in image recognition, object detection, scene understanding, and multimodal learning, emphasizing how these innovations drive AI toward more human-like cognition. Furthermore, the paper addresses the challenges of scalability, generalization, and ethical implications, offering insights into future directions for research and applications.
References
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NeurIPS), 25, 1097–1105. https://doi.org/10.1145/3065386
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://arxiv.org/abs/1804.02767
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 30. https://arxiv.org/abs/1706.03762
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., … Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning (ICML), 2048–2057. https://arxiv.org/abs/1502.03044
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2921–2929. https://doi.org/10.1109/CVPR.2016.319
Downloads
Published
Issue
Section
How to Cite
Similar Articles
- Moh. Habibur Rahman, Kecerdasan Buatan dalam Personalisasi Pembelajaran Online: Sebuah Tinjauan dari Pendekatan Komputer dan Sistem Informatika , Journal of Information Systems and Technology: Vol. 1 No. 1 (2025): Journal of Information Systems and Technology
- Muhammad Idris, Pemanfaatan Machine Learning untuk Optimasi Big Data dalam Sistem Informatika Modern , Journal of Information Systems and Technology: Vol. 1 No. 1 (2025): Journal of Information Systems and Technology
- Cici Lestari Farida, Information Systems Perspective on Data Extraction in Social Media: Toward a Theoretical Framework , Journal of Information Systems and Technology: Vol. 1 No. 1 (2025): Journal of Information Systems and Technology
You may also start an advanced similarity search for this article.