Bridging Vision and Understanding: The Central Role of Computer Vision in AI

Authors

  • Nurul Hidayati Universitas Amikom, Yogyakarta Author

Keywords:

Artificial Intelligence, Computer Vision , Machine Understanding

Abstract

Computer vision has become one of the most critical components in the advancement of artificial intelligence, enabling machines not only to perceive but also to interpret the world around them. This paper explores the central role of computer vision in bridging the gap between visual perception and higher-level machine understanding. By integrating deep learning, pattern recognition, and semantic interpretation, computer vision transforms raw visual data into structured knowledge that supports decision-making, reasoning, and autonomous behavior. The discussion highlights recent progress in image recognition, object detection, scene understanding, and multimodal learning, emphasizing how these innovations drive AI toward more human-like cognition. Furthermore, the paper addresses the challenges of scalability, generalization, and ethical implications, offering insights into future directions for research and applications.

References

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. https://doi.org/10.1109/CVPR.2016.90

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NeurIPS), 25, 1097–1105. https://doi.org/10.1145/3065386

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

Redmon, J., & Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://arxiv.org/abs/1804.02767

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Fei-Fei, L. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems (NeurIPS), 30. https://arxiv.org/abs/1706.03762

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., … Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning (ICML), 2048–2057. https://arxiv.org/abs/1502.03044

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2921–2929. https://doi.org/10.1109/CVPR.2016.319

Downloads

Published

2025-06-25

How to Cite

Bridging Vision and Understanding: The Central Role of Computer Vision in AI. (2025). Journal of Information Systems and Technology, 1(1), 1-8. https://athallahpublishing.com/index.php/jistech/article/view/37