LLaMeSIMD – LLM SIMD Intrinsic and Function Translation Benchmarking Suite

4 months ago 4

LLaMeSIMD is the world's first benchmarking suite designed to evaluate how well large language models (LLMs) can translate between different SIMD (Single Instruction Multiple Data) instruction sets across various CPU architectures.

Think of it as Rosetta Stone Validator for SIMD intrinsics, powered by AI!


  • Multi-Architecture Support:
    SSE4.2 (x86), NEON (ARM), VSX (PowerPC)

  • Dual Test Modes:

    • 1-to-1 Intrinsic Translation: "What's the NEON equivalent of _mm_add_ps?"
    • Full Function Translation: Convert complete SIMD functions between architectures
  • Multi-Model Evaluation:
    Test local (Ollama), open (HuggingFace), and proprietary (OpenAI/Claude/DeepSeek) models

  • Scientific Metrics:

    • Levenshtein similarity
    • AST structural similarity
    • Token overlap analysis
  • Beautiful Visualizations:
    Automatic generation of comparison charts and CSV reports


# Clone the repository git clone https://github.com/VectorCamp/LLaMeSIMD.git cd LLaMeSIMD # Create and activate virtual environment python -m venv venv source venv/bin/activate # Install dependencies pip install -r requirements.txt # Set up environment variables & edit with your API keys and model preferences # To specify your preferred models, please list them as using commas cd LLaMeSIMD cp .env.example .env
# Default architectures python run_suite.py --engines SSE4.2 NEON # Or select specific architectures (minimum 2 required) python run_suite.py --engines NEON VSX

2️⃣ Manually Clean the Produced Results

After running the tests, review and clean the generated results stored in the Suite-Results directory. This step ensures accuracy by removing any artifacts or irrelevant outputs before proceeding to evaluation.

python evaluate_results.py

After evaluation, you'll get:

  • Interactive Plots:
    • Weighted score comparisons across models
    • Architecture-specific performance breakdowns
  • CSV Reports:
    • Detailed metrics for each test case

SIMD optimization is crucial for:

  • High-performance computing
  • Game development
  • Scientific simulations
  • Computer vision
  • Cryptography

LLaMeSIMD helps:

  • Researchers benchmark model capabilities

🏆 Benchmarking Methodology

  • Dataset: Carefully curated intrinsic and function pairs (with significant help from our previously created tool, simd.info)
  • Metrics:
    • Levenshtein Similarity: Character-level accuracy
    • AST Similarity: Structural correctness
    • Token Overlap: Semantic similarity
  • Weighted Scoring: 50% Levenshtein + 30% AST + 20% Token
  • Add AVX-2 support
  • Add AVX-512 support
  • Add P@SS-1 Compilation Metric

BSD 2-Clause — Because performance optimization should be accessible to all!

Happy SIMD-ing! May your vectors always be aligned and your pipelines full! 🚀

Read Entire Article