Evaluation and Mobile Integration of Object Detection Models in AI-Based Industrial Training

Understanding the 5S Methodology Through Serious Games

Press enter or click to view image in full size

A heartfelt acknowledgment to Mouad Barras for his insightful, applied, and forward-thinking research. His work serves as a vital link between AI innovation and industrial training, demonstrating how artificial intelligence can enhance and support human learning rather than replace it.
This article revisits and expands upon his study, offering a more human-centered narrative that bridges technical depth with real-world significance, making the insights accessible to researchers, educators, and industry professionals alike.

Read the original paper on ResearchGate

Introduction: From Discipline to Intelligence

In modern industries, training efficiency and operational excellence are inseparable. The Japanese 5S methodology : Sort, Set in Order, Shine, Standardize, and Sustain remains one of the most effective frameworks for achieving productivity and safety. Yet, the real challenge lies in how people learn and retain these principles in the workplace.

Traditional lectures and checklists are often static. They don’t simulate the complexity, spatial reasoning, or instant decision-making that real industrial settings demand.

That’s where Artificial Intelligence (AI) and serious games enter the scene.
By integrating real-time object detection within an interactive game-based learning platform, trainees can visually engage with the environment, get instant feedback on their actions, and internalize the 5S process dynamically.

Mouad Barras’s research explores precisely this: deploying object detection models on Android mobile devices, offline, to support professional training through gamified experiences.

Dataset: The Foundation of Intelligent Learning

Every AI system is only as good as the data that shapes it. In this research, the dataset design and collection played a crucial role in achieving accurate and realistic performance for 5S-based learning.

Dataset Composition and Structure

The dataset used in the study was specifically curated for industrial training environments, with an emphasis on visual realism, object diversity, and scenario-based labeling.

It contains multiple 5S scenarios, but Scenario 2 — Seiton (Set in Order) was the most extensive and representative one.
It was therefore chosen as the benchmark for testing and model comparison.

A part of the dataset for seiton scenario

Press enter or click to view image in full size

5S Methodology Scenarios

The Seiton dataset was particularly challenging due to its realistic visual complexity — different lighting conditions, varied camera angles, partial occlusions, and a wide range of object classes such as tools, containers, cables, and safety equipment.

Annotation and Format

Each image was annotated using the YOLO format, which defines:

Bounding box coordinates (x_center, y_center, width, height)
Class ID (e.g., tool, box, marker, etc.)

The dataset ensured:

Balanced class distribution, avoiding bias toward overrepresented items
High annotation precision, verified through visual inspection
Train/validation/test splits optimized for generalization

This rigorous dataset design allowed the models (YOLOv8, YOLOv9, Faster R-CNN) to learn robust spatial and semantic representations of industrial objects in context.

Why the Dataset Matters

Unlike common object detection datasets (COCO, Pascal VOC), this dataset is domain-specific — tailored for industrial organization and cleanliness training.

That means the AI must understand not just what an object is, but also how it should be positioned according to the 5S principle.
This makes the dataset both technically challenging and pedagogically rich — a true reflection of real-world working environments.

Evaluated Models

Three leading object detection architectures were tested:

YOLOv8 — Ultralytics’ latest one-stage detector, offering high speed and accuracy.
YOLOv9 — Improved backbone and anchor-free design, focusing on precision.
Faster R-CNN — Two-stage detector, well-known for accuracy but resource-intensive.

Deployment Frameworks

Given that the app must run offline on Android, the research compared:

TensorFlow Lite (TFLite) — widely supported, lightweight TensorFlow runtime.
NCNN — Tencent’s open-source, high-performance inference framework optimized for mobile CPUs/GPUs.

Evaluation Environment

Each model was trained and tested on the Seiton dataset using the same parameters, then exported and deployed on Android devices. Metrics such as inference speed, FPS, latency, model size, and compatibility were analyzed.

Conversion, Optimization, and Integration into Android Application

Option 1: NCNN Conversion

The first approach employed NCNN, a high-performance neural network inference framework for mobile devices developed by Tencent.
The process began by converting the trained YOLO model weights from the PyTorch (.pt) format into the Open Neural Network Exchange (.onnx) format through Python scripts.

Once the ONNX model was obtained, it was converted into the NCNN format, generating two essential files:

A binary file (.bin)
A parameter file (.param)

The NCNN framework was then embedded into the mobile application using C++ and CMake, ensuring optimal runtime performance on Android devices.
The Java layer handled the application logic and interfaced with the C++ libraries through CMake, while also managing the user interface (UI) for a responsive and user-friendly experience.

All development and integration were carried out within Android Studio, which provided a smooth environment for compilation, testing, and optimization.
This pipeline enabled real-time inference and ensured the highest model speed, making the system suitable for interactive object detection in mobile training applications.

Option 2: TensorFlow Lite (TFLite) Conversion

The second deployment strategy focused on TensorFlow Lite (TFLite), a lightweight and efficient framework for running TensorFlow models on mobile and embedded devices.

Two models — YOLO (.pt) and Faster R-CNN (.h5) — were directly converted into the TFLite (.tflite) format for deployment.
In this setup, Java was primarily used to manage the deployment and interaction with the TFLite interpreter, simplifying the integration process.

Each model’s associated object labels were stored in a separate .txt file, which TFLite used to map detections to class names during inference.
This method leveraged TensorFlow Lite’s native tools, taking advantage of its compatibility and stability across Android devices.

Performance Comparison

After successful conversion and integration, the performance of each model was compared to determine which configuration was most suitable for mobile deployment.

The evaluation criteria went beyond accuracy alone — it also included:

Speed and inference time
Resource utilization (CPU/GPU efficiency)
Ease of integration and maintainability

These combined factors helped identify the best-performing model–framework pair, ensuring the highest possible performance for real-time object detection in all 5S training scenarios.

Results and Analysis

Press enter or click to view image in full size

Comparative Performance of Object Detection Models for 5S Training

Key Findings

Despite YOLOv9’s slightly higher numerical metrics in F1-Score (95% vs 93%) and mAP (97.26% vs 96.5%), YOLOv8 emerged as the best practical model for this application. This decision was driven by critical factors beyond raw performance numbers, including stability, cross-platform compatibility, and reliable offline operation capabilities.

The evaluation revealed that deployment feasibility is equally important as accuracy metrics when selecting models for real-world applications. While YOLOv9 showed marginal improvements in detection accuracy, its incompatibility with mobile optimization frameworks (NCNN and TensorFlow Lite) made it impractical for the target deployment environment.

Faster R-CNN demonstrated the weakest performance across all metrics, with an F1-Score of 89.8%, mAP of 88.1%, and Recall of 87.9%. More critically, its high computational latency makes it fundamentally unsuitable for real-time object detection scenarios required in interactive training applications.

Press enter or click to view image in full size

Visual Overview of the Deployed Models

Press enter or click to view image in full size

Visual Overview of the Three Model Outcomes

Press enter or click to view image in full size

Deployment Performance Comparison on YOLOv8 and Faster R-CNN

This show the analysis of how YOLOv8 performs competitively well when used with NCNN than TFLite by undergoing less inference time with more FPS making them best for real-time use. Similarly, take into consideration thatFaster R-CNN was considerably slower on TFLite, potentially notsuitable for use in real-time application.

Optimal Configuration: YOLOv8 + NCNN

When paired with NCNN (Neural Network Compute Framework), YOLOv8 demonstrated superior practical performance characteristics:

Faster frame rates: Optimized inference speed enabling smooth real-time processing
Lower latency: Minimal delay between detection and response, critical for user interaction
Real-time feedback: Immediate visual feedback ideal for interactive learning environments
Mobile deployment: Seamless conversion and execution on resource-constrained devices
Offline capability: Full functionality without network connectivity requirements

Significance: Beyond Numbers

Practical Implications

The system allows trainees to:

Interact with a simulated workplace
Organize objects virtually according to 5S rules
Receive immediate AI-based feedback
Track progress in real time, even offline

This transforms industrial training into an immersive, game-based experience, increasing retention and motivation.

Educational and Ethical Aspects

While enhancing learning, the use of AI requires:

Privacy safeguards for any captured images
Bias monitoring to avoid misclassification under different lighting or contexts
Continuous retraining as new equipment or layouts appear
Human oversight to ensure the AI complements, not replaces, human judgment

Conclusion: The Path Ahead

The research demonstrates that AI-powered serious games can meaningfully enhance professional learning by turning theoretical concepts into interactive experiences.

Among the tested models, YOLOv8 integrated with NCNN stood out as the ideal balance between speed, accuracy, and deployment readiness. The carefully curated Seiton dataset provided a challenging and realistic foundation, proving that high-quality, domain-specific data is as important as the model itself.

In general, by including object detection in the serious game, trainers can visually recognize items with labels and verify if they are properly placed in their right spots according to the 5S methodology. This professional approach not just reinforces the trainer’s comprehension of the framework, but also gives them real-time feedback to help them apply the methodology correctly in the real world. This improves their learning experience by mixing theoretical knowledge and hands-on application, making training more successful and relevant.

This study highlights a paradigm shift:

From teaching rules to enabling real-time guided practice, powered by artificial intelligence.

It’s an inspiring step toward human-AI collaboration in industrial education.

Credits and Acknowledgment

Special thanks once again to Mouad Barras — the original author of the study that inspired this article.
Read the original publication on ResearchGate

This extended version was written and humanized by Mustapha Ait Igunaoun, researcher in Computer Vision, Deep Learning, and Intelligent Transportation Systems, integrating insights from applied AI deployment and educational design.

If you enjoyed this analysis, follow me for more explorations into AI-driven learning, intelligent systems, and real-world deep learning applications.

Together, let’s make AI not just smarter — but more human.

Thanks for reading! If you enjoyed this breakdown, consider following me for more deep dives into computer vision, AI, and the future of machine learning.

If you’d like to learn more or collaborate, feel free to connect with me — i suggest you to follow me in LinkedIn because i am active their :

GitHub: https://github.com/mustaphaAitigunaoun
LinkedIn: https://www.linkedin.com/in/mustapha-ait-igunaoun/

Stay tuned for more deep dives into object detection, tracking, and AI research.