Question 1

What is the difference between fine-tuning and prompt engineering?

Accepted Answer

Prompt engineering adjusts the input to a general-purpose model at inference time. Fine-tuning updates the model weights using your domain data — the model learns your vocabulary, style, and task patterns during training. Fine-tuning produces a model that outperforms prompt engineering on domain tasks, runs 10-100x faster (smaller model, no long system prompts), and costs far less at scale. For simple tasks on low volumes, prompt engineering is sufficient. For high-volume production workloads or tasks requiring deep domain knowledge, fine-tuning is the correct approach.

Question 2

Which base models do you fine-tune?

Accepted Answer

We work with open-weight models (Llama 3, Mistral, Phi-3, Qwen2) for deployments requiring on-premises or private cloud hosting, and with API-based fine-tuning (OpenAI fine-tuning API, Gemini fine-tuning) for clients using managed infrastructure. For Arabic language tasks, we start from Arabic-pretrained base models (AraBERT, AraT5, Jais) rather than fine-tuning Western-trained models with weak Arabic foundations.

Question 3

How much data do I need for fine-tuning?

Accepted Answer

For instruction fine-tuning (adapting an LLM to follow domain-specific instructions): 500–5,000 high-quality examples is typically sufficient with LoRA. For continued pre-training on domain text: 1M+ tokens of domain corpus. For embedding model fine-tuning (improving retrieval): 1,000–10,000 query-document pairs. We assess your data in the first week and recommend the appropriate approach for your volume and quality.

Question 4

Can fine-tuned models be run on-premises in the UAE?

Accepted Answer

Yes. Fine-tuned open-weight models can be deployed entirely within your UAE data centre or private cloud — no data leaves your environment at inference time. This is the preferred deployment model for clients with DIFC Data Protection Law, DHA health data, or CBUAE financial data constraints. We deploy using vLLM or Text Generation Inference (TGI) on your Kubernetes cluster with GPU nodes.

Metric	Before	After
Task Accuracy	GPT-4 on UAE compliance documents: 74% accuracy on entity extraction	Fine-tuned model: 91% accuracy — trained on 2,000 annotated UAE compliance docs
Inference Cost	GPT-4 API: AED 0.18 per document processed	Deployed fine-tuned model: AED 0.009 per document — 95% cost reduction
Latency	GPT-4 API: 2-8 second response time with rate limiting	Self-hosted fine-tuned model: 180ms average — suitable for real-time workflows

Foundation Models, Trained on Your Domain

You might be experiencing...

When Fine-Tuning Is the Right Approach

Arabic Language Fine-Tuning

Engagement Phases

Dataset Curation & Preparation

Fine-Tuning & Evaluation

Deployment & Integration

Deliverables

Before & After

Tools We Use

Frequently Asked Questions

Build It. Run It. Own It.