This job offer is not available in your country.

AI Quality Engineer

Elile.AiMuscat, Muscat Governorate, Oman

30+ days ago

Job description

We are seeking a highly skilled AI Quality Engineer to join our National Large Language Model (LLM) Project. This key role will focus on establishing and implementing robust data quality frameworks, evaluation methodologies, and quality gates throughout the LLM development lifecycle. The ideal candidate will ensure our Arabic LLM meets the highest standards of performance, reliability, and cultural appropriateness before being deployed to 20,000 government employees.

Key Responsibilities :

Design and implement comprehensive data quality frameworks specific to Arabic language datasets for LLM training and evaluation
Establish and enforce quality gates at each project phase (data preparation, model training, evaluation, and RAG implementation)
Develop detailed acceptance criteria for each phase gate requiring formal sign-off from key stakeholders
Create and implement quality metrics for data annotation, achieving >

90% inter-annotator agreement and >

95% cultural / contextual accuracy

Design and maintain data pipeline quality assurance processes for Arabic text normalization, diacritics standardization, and dialect variation mapping

Implement Arabic-specific tokenization optimization with >

98% vocabulary coverage and >

95% morphological accuracy

Develop comprehensive RAG quality measurement frameworks covering both retrieval metrics

Establish automated monitoring systems for continuous quality assessment with real-time dashboards

Create and enforce testing protocols for model evaluation across various Arabic language tasks

Implement robust regression testing frameworks to ensure model updates maintain or improve quality metrics

Develop protocols for bias detection and mitigation in both training data and model outputs

Support the implementation of benchmarking against global standards

Design human evaluation frameworks to assess model outputs qualitatively

Collaborate with data annotation teams to ensure high-quality ground truth data

Participate in weekly quality committee meetings and bi-weekly RAG performance reviews

Create and maintain quality documentation including processes, guidelines, and acceptance criteria

Requirements :

Bachelor's or Master's degree in Computer Science, AI, Machine Learning, or related field

4+ years of experience in AI / ML quality assurance, with specific focus on natural language processing

Strong understanding of LLM evaluation methodologies and benchmarking techniques

Experience establishing quality gates and acceptance criteria for AI systems

Hands-on experience with data quality frameworks and validation techniques

Experience implementing multi-level annotation review processes with clear metrics

Proficiency in designing data pipeline quality assurance systems for Arabic language processing

Experience with RAG quality assessment covering both retrieval and generation components

Ability to establish and track performance metrics against benchmarks

Experience implementing automated testing frameworks and continuous integration for ML systems

Strong knowledge of bias detection and fairness assessment in AI systems

Familiarity with Arabic language and NLP challenges specific to Semitic languages

Experience with human evaluation protocols and annotation quality assessment

Proficiency in Python and relevant testing / quality assurance libraries

Understanding of statistical analysis techniques for model evaluation

Experience with data annotation platforms and quality control mechanisms

Knowledge of responsible AI practices and ethical considerations

Preferred Qualifications :

Experience with LLM evaluation specifically for government or enterprise applications

Knowledge of Arabic-specific LLM benchmarks

Experience with RAG system evaluation and quality assurance

Familiarity with platforms like Scale AI, Humanloop, or other annotation / evaluation systems

Experience with hallucination detection and factual consistency verification

Knowledge of prompt engineering and prompt quality assessment

Experience with MLOps and quality gates in CI / CD pipelines for ML

Proficiency with data lineage tracking and documentation

Experience implementing A / B testing frameworks for model comparison

Familiarity with user experience testing for AI applications

Experience with security and privacy testing for AI systems

Knowledge of ROUGE, BLEU, BERTScore, and other NLP evaluation metrics

Experience creating custom metrics for domain-specific tasks

Experience participating in quality governance committees

#J-18808-Ljbffr

Create a job alert for this search

Engineer • Muscat, Muscat Governorate, Oman