Posts by Collection

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Face Detection in Camera Captured Images of Identity Documents Under Challenging Conditions

Published in International Conference on Document Analysis and Recognition Workshops (ICDARW), 2019

Face detection for identity documents under challenging capture conditions.

Recommended citation: S. Bakkali, Z. Ming, M. M. Luqman, J-C. Burie. "Face Detection in Camera Captured Images of Identity Documents Under Challenging Conditions." ICDARW 2019, Vol. 4, pp. 55-60. https://arxiv.org/pdf/1911.03567

Visual and Textual Deep Feature Fusion for Document Image Classification

Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020

Fusion of visual and textual deep features for document image classification.

Recommended citation: M. Farhan, N. I. Kajla, M. D. A. Awan, M. M. Luqman, M. Coustaty, S. Bakkali. "Visual and Textual Deep Feature Fusion for Document Image Classification." CVPRW 2020, pp. 562-563. https://openaccess.thecvf.com/content_CVPRW_2020/papers/w34/Bakkali_Visual_and_Textual_Deep_Feature_Fusion_for_Document_Image_Classification_CVPRW_2020_paper.pdf

Cross-modal Deep Networks for Document Image Classification

Published in IEEE International Conference on Image Processing (ICIP), 2020

Cross-modal deep networks for document image classification.

Recommended citation: S. Bakkali, Z. Ming, M. Coustaty, M. Rusinol. "Cross-modal Deep Networks for Document Image Classification." ICIP 2020, pp. 2556-2560. https://marcalr.github.io/pdfs/ICIP20.pdf

EAML: Ensemble Self-Attention-Based Mutual Learning Network for Document Image Classification

Published in International Journal on Document Analysis and Recognition (IJDAR), 2021

A mutual learning framework with ensemble self-attention for document image classification.

Recommended citation: S. Bakkali, Z. Ming, M. Coustaty, M. Rusinol. "EAML: Ensemble Self-Attention-Based Mutual Learning Network for Document Image Classification." IJDAR 24(3):251-268, 2021. https://arxiv.org/pdf/2305.06923

Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning

Published in PhD Thesis, Universite de La Rochelle, 2022

PhD thesis on unified vision-language learning for multimodal document understanding.

Recommended citation: S. Bakkali. "Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning." PhD Thesis, Universite de La Rochelle, 2022. https://theses.hal.science/tel-04197696/

VLCDoC: Vision-Language Contrastive Pre-training Model for Cross-Modal Document Classification

Published in Pattern Recognition, 2023

Vision-language contrastive pre-training for robust cross-modal document classification.

Recommended citation: S. Bakkali, Z. Ming, M. Coustaty, M. Rusinol, O. Ramos Terrades. "VLCDoC: Vision-Language Contrastive Pre-training Model for Cross-Modal Document Classification." Pattern Recognition 139:109419, 2023. https://arxiv.org/pdf/2205.12029

State-of-the-Art Khmer Text Recognition Using Deep Learning Models

Published in ASEAN Conference on Emerging Technologies, 2024

A survey of Khmer text recognition methods using deep learning.

Recommended citation: S. Keo, M. Coustaty, S. Bakkali, et al. "State-of-the-Art Khmer Text Recognition Using Deep Learning Models." ASEAN Conference on Emerging Technologies, 2024. https://hal.science/hal-05157225v1/file/ACET2024.pdf

LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models

Published in IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), 2024

A blockchain-based reputation mechanism for sharing and evaluating large language models.

Recommended citation: M. A. Bouchiha, Q. Telnoff, S. Bakkali, R. Champagnat, M. Rabah, M. Coustaty, Y. Ghamri-Doudane. "LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models." COMPSAC 2024, pp. 439-448. https://arxiv.org/pdf/2404.13236

Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting

Published in International Conference on Document Analysis and Recognition (ICDAR), 2024

Anytime early-exit strategies for efficient multimodal document classification.

Recommended citation: O. Hamed, S. Bakkali, M. Blaschko, S. Moens, J. Van Landeghem. "Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting." ICDAR 2024, pp. 270-286. https://arxiv.org/pdf/2405.12705

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

Published in Asian Conference on Computer Vision (ACCV), 2024

A benchmark for Khmer scene-text detection and recognition.

Recommended citation: V. Nom, S. Bakkali, M. M. Luqman, M. Coustaty, J-M. Ogier. "KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark." ACCV 2024, pp. 1777-1792. https://openaccess.thecvf.com/content/ACCV2024/papers/Nom_KhmerST_A_Low-Resource_Khmer_Scene_Text_Detection_and_Recognition_Benchmark_ACCV_2024_paper.pdf

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification

Published in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025

A cross-modal framework for robust document image retrieval in real-world settings.

Recommended citation: S. Bakkali, S. Biswas, Z. Ming, M. Coustaty, M. Rusinol, O. Ramos Terrades, J. Llados. "GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification." WACV 2025, pp. 1436-1446. https://openaccess.thecvf.com/content/WACV2025/papers/Bakkali_GlobalDoc_A_Cross-Modal_Vision-Language_Framework_for_Real-World_Document_Image_Retrieval_WACV_2025_paper.pdf

DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization

Published in Winter Conference on Applications of Computer Vision (WACV), 2025

Domain-adaptive pre-training for document-level abstractive summarization.

Recommended citation: P. P. M. Chau, S. Bakkali, A. Doucet. "DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization." WACV 2025, pp. 1303-1312. https://openaccess.thecvf.com/content/WACV2025W/VISIONDOCS/papers/Chau_DocSum_Domain-Adaptive_Pre-training_for_Document_Abstractive_Summarization_WACVW_2025_paper.pdf

IDTrust: Deep Identity Document Quality Detection with Bandpass Filtering

Published in Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025

Quality assessment for identity documents using bandpass filtering.

Recommended citation: M. Al-Ghadi, J. Voerman, S. Bakkali, M. Coustaty, O. Lessard, N. Sidere. "IDTrust: Deep Identity Document Quality Detection with Bandpass Filtering." WACV Workshops 2025, pp. 716-723. https://openaccess.thecvf.com/content/WACV2025W/AI4MFDD/papers/Al-Ghadi_IDTrust_Deep_Identity_Document_Quality_Detection_with_Bandpass_Filtering_WACVW_2025_paper.pdf

Fusion of GNN and GBDT Models for Graph and Node Classification

Published in International Workshop on Graph-Based Representations in Pattern Recognition (GbRPR), 2025

Hybrid graph learning by combining GNNs with gradient-boosted decision trees.

Recommended citation: M. Farhan, N. I. Kajla, M. D. A. Awan, M. M. Luqman, M. Coustaty, S. Bakkali. "Fusion of GNN and GBDT Models for Graph and Node Classification." GbRPR 2025, pp. 167-178. https://hal.science/hal-05111226v1/file/Fusion%20of%20GNN%20and%20GBDT%20Models%20%28Published%20Work%29.pdf

Evaluating the Impact of Khmer Font Types on Text Recognition

Published in arXiv preprint, 2025

An empirical analysis of Khmer font characteristics for text recognition.

Recommended citation: V. Nom, S. Bakkali, M. M. Luqman, et al. "Evaluating the Impact of Khmer Font Types on Text Recognition." arXiv:2506.23963, 2025. https://arxiv.org/pdf/2506.23963

Confidence-based Knowledge Distillation to Reduce Training Costs and Carbon Footprint for Low-Resource Neural Machine Translation

Published in Applied Sciences, 2025

This paper proposes confidence-based distillation strategies for low-resource neural machine translation.

Recommended citation: M. Zafar, P. J. Wall, S. Bakkali, et al. "Confidence-based Knowledge Distillation to Reduce Training Costs and Carbon Footprint for Low-Resource Neural Machine Translation." Applied Sciences 15(14):8091, 2025. https://www.mdpi.com/2076-3417/15/14/8091

WildKhmerST: A Comprehensive Benchmark Dataset for Khmer Scene Text Detection and Recognition

Published in International Conference on Document Analysis and Recognition (ICDAR), 2025

A large-scale benchmark for Khmer scene-text detection and recognition.

Recommended citation: S. Keo, V. Nom, S. Bakkali, M. M. Luqman, M. Rusinol, M. Coustaty, J-M. Ogier. "WildKhmerST: A Comprehensive Benchmark Dataset for Khmer Scene Text Detection and Recognition." ICDAR 2025, pp. 351-368. https://hal.science/hal-05120511/document

Visual Text Generation in Khmer Language: Challenges and Trends with Diffusion Models

Published in International Conference on Document Analysis and Recognition (ICDAR), 2025

A survey of diffusion-based approaches for Khmer text generation.

Recommended citation: S. Keo, V. Nom, S. Bakkali, M. M. Luqman, M. Coustaty, J-M. Ogier. "Visual Text Generation in Khmer Language: Challenges and Trends with Diffusion Models." ICDAR 2025, pp. 134-152. https://hal.science/hal-05178406/document

Cross-Lingual Learning for Low-Resource Khmer Scene Text Detection and Recognition

Published in International Conference on Document Analysis and Recognition (ICDAR), 2025

Cross-lingual learning strategies for Khmer scene-text recognition.

Recommended citation: V. Nom, S. Keo, S. Bakkali, et al. "Cross-Lingual Learning for Low-Resource Khmer Scene Text Detection and Recognition." ICDAR 2025, pp. 347-365. https://hal.science/hal-05191219/document

Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering

Published in arXiv preprint, 2025

Hybrid RAG strategies for multilingual document question answering.

Recommended citation: A. Mudet, S. Bakkali. "Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering." arXiv:2512.12694, 2025. https://arxiv.org/pdf/2512.12694

Souhail Bakkali

Posts by Collection

portfolio

Portfolio item number 1

Portfolio item number 2

publications

Face Detection in Camera Captured Images of Identity Documents Under Challenging Conditions

Visual and Textual Deep Feature Fusion for Document Image Classification

Cross-modal Deep Networks for Document Image Classification

EAML: Ensemble Self-Attention-Based Mutual Learning Network for Document Image Classification

Multimodal Document Understanding with Unified Vision and Language Cross-Modal Learning

VLCDoC: Vision-Language Contrastive Pre-training Model for Cross-Modal Document Classification

State-of-the-Art Khmer Text Recognition Using Deep Learning Models

LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models

Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting

KhmerST: A Low-Resource Khmer Scene Text Detection and Recognition Benchmark

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification

DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization

IDTrust: Deep Identity Document Quality Detection with Bandpass Filtering

Fusion of GNN and GBDT Models for Graph and Node Classification

Evaluating the Impact of Khmer Font Types on Text Recognition

Confidence-based Knowledge Distillation to Reduce Training Costs and Carbon Footprint for Low-Resource Neural Machine Translation

WildKhmerST: A Comprehensive Benchmark Dataset for Khmer Scene Text Detection and Recognition

Visual Text Generation in Khmer Language: Challenges and Trends with Diffusion Models

Cross-Lingual Learning for Low-Resource Khmer Scene Text Detection and Recognition

Hybrid Retrieval-Augmented Generation for Robust Multilingual Document Question Answering

Scientific Research (Legacy)

Research Activities

talks

Talks and Presentations

teaching

Teaching Experience