doc: add test generation instructions and update run.sh

Added documentation for test generation through Ollama, including new command-line arguments for `generate_tests.py` and updated `run.sh` script. Also added a new `gen` command to `run.sh` for generating tests via Ollama. This improves usability by providing clear instructions and automation for test generation.
2026-01-17 02:40:38 +03:00
parent 5c17378ce4
commit f117c7b23c
11 changed files with 393 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -16,13 +16,31 @@ pip install -r requirements.txt
 ./run.sh run --model llama3 --ollama-url http://localhost:11434
 ```

+### Генерация тестов через Ollama
+
+```bash
+# Генерировать 1 тест для каждой категории через Ollama
+./run.sh gen
+
+# Или через Python напрямую
+python scripts/generate_tests.py --count 1 --category all --model llama3 --ollama-url http://localhost:11434
+```
+
 ### Через Python

 ```bash
 python src/main.py --model llama3 --ollama-url http://localhost:11434
 ```

-### Аргументы
+### Аргументы для generate_tests.py
+
+- `--count`: Количество тестов для генерации (по умолчанию: 1)
+- `--category`: Категория тестов (translation, summarization, codegen, или all) (по умолчанию: all)
+- `--model`: Название модели для генерации тестов (обязательный)
+- `--ollama-url`: URL подключения к Ollama серверу (обязательный)
+- `--validate`: Валидировать тесты в указанной директории
+
+### Аргументы для main.py

 - `--model`: Название модели для тестирования (обязательный)
 - `--ollama-url`: URL подключения к Ollama серверу (обязательный)
@@ -37,6 +55,11 @@ python src/main.py --model llama3 --ollama-url http://localhost:11434
 ./run.sh run --model llama3 --ollama-url http://localhost:11434
 ```

+Генерация тестов через Ollama:
+```bash
+./run.sh gen
+```
+
 Запуск только тестов переводов:
 ```bash
 ./run.sh run --model llama3 --ollama-url http://localhost:11434 --benchmarks translation
--- a/run.sh
+++ b/run.sh
@@ -39,6 +39,11 @@ if [ -n "$1" ]; then
    python src/main.py "$@"
  elif [[ "$1" == "clean" ]]; then
    clean
+  elif [[ "$1" == "gen" ]]; then
+    activate
+    echo "🤖 Генерирую тесты через Ollama..."
+    python scripts/generate_tests.py --count 1 --category all --model second_constantine/t-lite-it-1.0:7b --ollama-url http://10.0.0.4:11434
+    echo "✅ Тесты успешно сгенерированы"
  fi
 else
    echo "  Аргументом необходимо написать название скрипта (+опционально аргументы скрипта)"
@@ -47,4 +52,5 @@ else
    echo " * upd - обновление зависимостей"
    echo " * run - запуск бенчмарков"
    echo " * clean - очистка отчетов"
+    echo " * gen - генерация тестов через Ollama"
 fi
--- a/scripts/README.md
+++ b/scripts/README.md
@@ -0,0 +1,86 @@
+# Скрипты для генерации тестов
+
+Эта директория содержит утилиты для автоматизированной генерации и валидации тестовых данных с использованием Ollama.
+
+## generate_tests.py
+
+Скрипт для генерации тестовых данных для AI бенчмарка через LLM.
+
+### Использование
+
+```bash
+# Генерировать 2 теста для перевода через Ollama
+python scripts/generate_tests.py --count 2 --category translation --model llama3 --ollama-url http://localhost:11434
+
+# Генерировать 1 тест для каждого типа
+python scripts/generate_tests.py --count 1 --category all --model llama3 --ollama-url http://localhost:11434
+
+# Генерировать 3 теста для пересказов
+python scripts/generate_tests.py --count 3 --category summarization --model llama3 --ollama-url http://localhost:11434
+
+# Валидировать существующие тесты
+python scripts/generate_tests.py --validate tests/translation
+```
+
+### Аргументы
+
+- `--count`: Количество тестов для генерации (по умолчанию: 1)
+- `--category`: Категория тестов (translation, summarization, codegen, или all) (по умолчанию: all)
+- `--model`: Название модели для генерации тестов (обязательный параметр, например: llama3)
+- `--ollama-url`: URL подключения к Ollama серверу (обязательный параметр, например: http://localhost:11434)
+- `--validate`: Валидировать тесты в указанной директории
+
+### Поддерживаемые категории
+
+1. **translation** - тесты переводов с английского на русский (LLM генерирует английский текст и его перевод)
+2. **summarization** - тесты пересказов текстов (LLM генерирует текст и его пересказ)
+3. **codegen** - тесты генерации Python кода (LLM генерирует задание и код)
+
+### Как работает генерация
+
+Скрипт использует LLM для динамической генерации тестов:
+- **Translation**: LLM создает английский текст, затем переводит его на русский
+- **Summarization**: LLM генерирует текст о технологиях, затем создает его пересказ
+- **Codegen**: LLM формулирует задачу по программированию, затем пишет решение
+
+### Примеры generated тестов
+
+#### Translation
+```json
+{
+  "prompt": "Translate the following English text to Russian: 'Hello, how are you today?'",
+  "expected": "Привет, как дела сегодня?"
+}
+```
+
+#### Summarization
+```json
+{
+  "prompt": "Summarize the following text in 1-2 sentences: 'The quick brown fox jumps over the lazy dog...'",
+  "expected": "A quick fox jumps over a lazy dog, surprising it. The fox keeps running while the dog barks."
+}
+```
+
+#### Codegen
+```json
+{
+  "prompt": "Write a Python function that calculates the factorial of a number using recursion.",
+  "expected": "def factorial(n):\n    if n == 0 or n == 1:\n        return 1\n    else:\n        return n * factorial(n-1)"
+}
+```
+
+### Валидация
+
+Скрипт автоматически валидирует generated тесты:
+- Проверяет наличие обязательных полей (`prompt`, `expected`)
+- Проверяет, что значения являются строками
+- Проверяет, что строки не пустые
+- Поддерживает ручную валидацию существующих тестов через `--validate`
+
+### Технические детали
+
+- Скрипт использует ollama_client.py для подключения к Ollama серверу
+- Каждый generated тест получает уникальный номер (test1.json, test2.json, и т.д.)
+- Если тест с таким номером уже существует, используется следующий доступный номер
+- Все тесты сохраняются в формате JSON с UTF-8 кодировкой
+- Поддерживается любая модель, доступная в Ollama
--- a/scripts/generate_tests.py
+++ b/scripts/generate_tests.py
@@ -0,0 +1,249 @@
+#!/usr/bin/env python3
+"""
+Скрипт для генерации тестовых данных для бенчмарка AI с использованием Ollama.
+
+Генерирует тесты через LLM для категорий:
+- translation (переводы)
+- summarization (пересказы)
+- codegen (генерация кода)
+
+Поддерживает валидацию generated тестов.
+"""
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Dict, List, Optional
+
+# Добавляем путь к исходникам, чтобы импортировать ollama_client
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from src.models.ollama_client import OllamaClient
+
+def generate_translation_test(ollama: OllamaClient, model: str) -> Dict[str, str]:
+    """Генерирует тест для перевода через LLM."""
+    # Генерируем английский текст
+    en_prompt = 'Generate a simple English sentence for translation test. The sentence should be clear and not too long (5-10 words). Example: "Hello, how are you today?"'
+    en_text = ollama.generate(model, en_prompt).strip()
+
+    # Генерируем перевод
+    ru_prompt = f"""Translate the following English sentence to Russian:
+"{en_text}"
+Provide only the translation, no additional text."""
+    ru_text = ollama.generate(model, ru_prompt).strip()
+
+    return {
+        "prompt": f"Translate the following English text to Russian: '{en_text}'",
+        "expected": ru_text
+    }
+
+def generate_summarization_test(ollama: OllamaClient, model: str) -> Dict[str, str]:
+    """Генерирует тест для пересказа через LLM."""
+    # Генерируем текст для пересказа
+    text_prompt = 'Generate a short text (3-5 sentences) for summarization test. The text should be about technology, science, or programming. Example: "Artificial intelligence is intelligence demonstrated by machines. It involves studying intelligent agents that perceive their environment and take actions to achieve goals."'
+    text = ollama.generate(model, text_prompt).strip()
+
+    # Генерируем пересказ
+    summary_prompt = f"""Summarize the following text in 1-2 sentences:
+"{text}"
+Provide only the summary, no additional text."""
+    summary = ollama.generate(model, summary_prompt).strip()
+
+    return {
+        "prompt": f"Summarize the following text in 1-2 sentences: '{text}'",
+        "expected": summary
+    }
+
+def generate_codegen_test(ollama: OllamaClient, model: str) -> Dict[str, str]:
+    """Генерирует тест для генерации кода через LLM."""
+    # Генерируем задание для кода
+    task_prompt = 'Generate a simple Python programming task. The task should be clear and specific, asking to write a function. Example: "Write a Python function that calculates the factorial of a number using recursion."'
+    task = ollama.generate(model, task_prompt).strip()
+
+    # Генерируем код
+    code_prompt = f"""Write Python code to solve the following task:
+"{task}"
+Provide only the code, no explanations or additional text."""
+    code = ollama.generate(model, code_prompt).strip()
+
+    return {
+        "prompt": task,
+        "expected": code
+    }
+
+def generate_test(ollama: OllamaClient, model: str, category: str) -> Dict[str, str]:
+    """Генерирует тест для указанной категории через LLM."""
+    if category == "translation":
+        return generate_translation_test(ollama, model)
+    elif category == "summarization":
+        return generate_summarization_test(ollama, model)
+    elif category == "codegen":
+        return generate_codegen_test(ollama, model)
+    else:
+        raise ValueError(f"Unknown category: {category}")
+
+def validate_test(test_data: Dict[str, str]) -> bool:
+    """Валидирует тестовые данные."""
+    if not isinstance(test_data, dict):
+        print("❌ Тест должен быть словарём (JSON объект)")
+        return False
+
+    if "prompt" not in test_data:
+        print("❌ Отсутствует поле 'prompt'")
+        return False
+
+    if "expected" not in test_data:
+        print("❌ Отсутствует поле 'expected'")
+        return False
+
+    if not isinstance(test_data["prompt"], str):
+        print("❌ Поле 'prompt' должно быть строкой")
+        return False
+
+    if not isinstance(test_data["expected"], str):
+        print("❌ Поле 'expected' должно быть строкой")
+        return False
+
+    if not test_data["prompt"].strip():
+        print("❌ Поле 'prompt' не может быть пустым")
+        return False
+
+    if not test_data["expected"].strip():
+        print("❌ Поле 'expected' не может быть пустым")
+        return False
+
+    return True
+
+def validate_all_tests(test_dir: str) -> None:
+    """Валидирует все тесты в указанной директории."""
+    test_dir_path = Path(test_dir)
+    if not test_dir_path.exists():
+        print(f"❌ Директория {test_dir} не существует")
+        return
+
+    valid_count = 0
+    invalid_count = 0
+
+    for json_file in test_dir_path.glob("*.json"):
+        try:
+            with open(json_file, "r", encoding="utf-8") as f:
+                test_data = json.load(f)
+
+            if validate_test(test_data):
+                valid_count += 1
+                print(f"✅ {json_file.name} - валиден")
+            else:
+                invalid_count += 1
+                print(f"❌ {json_file.name} - не валиден")
+        except json.JSONDecodeError:
+            invalid_count += 1
+            print(f"❌ {json_file.name} - некорректный JSON")
+        except Exception as e:
+            invalid_count += 1
+            print(f"❌ {json_file.name} - ошибка: {str(e)}")
+
+    print(f"\nРезультаты валидации:")
+    print(f"Валидных тестов: {valid_count}")
+    print(f"Невалидных тестов: {invalid_count}")
+    print(f"Всего тестов: {valid_count + invalid_count}")
+
+def generate_tests(ollama: OllamaClient, model: str, count: int, category: str, output_dir: str) -> None:
+    """Генерирует указанное количество тестов через LLM."""
+    if category not in ["translation", "summarization", "codegen", "all"]:
+        print(f"❌ Неизвестная категория: {category}")
+        return
+
+    categories = [category] if category != "all" else ["translation", "summarization", "codegen"]
+
+    for cat in categories:
+        cat_dir = Path(output_dir) / cat
+        cat_dir.mkdir(parents=True, exist_ok=True)
+
+        for i in range(1, count + 1):
+            # Проверяем, существует ли уже тест с таким номером
+            test_num = 1
+            while True:
+                test_file = cat_dir / f"test{test_num}.json"
+                if not test_file.exists():
+                    break
+                test_num += 1
+
+            print(f"🤖 Генерирую тест {cat}/test{test_num}.json...")
+
+            # Генерируем тест через LLM
+            test_data = generate_test(ollama, model, cat)
+
+            # Валидируем generated тест
+            if not validate_test(test_data):
+                print(f"❌ Сгенерирован невалидный тест для {cat}, тест номер {test_num}")
+                continue
+
+            # Сохраняем тест
+            with open(test_file, "w", encoding="utf-8") as f:
+                json.dump(test_data, f, ensure_ascii=False, indent=2)
+
+            print(f"✅ Создан тест {cat}/test{test_num}.json")
+
+def main():
+    """Основная функция скрипта."""
+    parser = argparse.ArgumentParser(
+        description="Генератор тестовых данных для AI бенчмарка с использованием Ollama",
+        epilog="Примеры использования:\n"
+               "  python scripts/generate_tests.py --count 2 --category translation --model second_constantine/t-lite-it-1.0:7b --ollama-url http://10.0.0.4:11434\n"
+               "  python scripts/generate_tests.py --category all --model second_constantine/t-lite-it-1.0:7b --ollama-url http://10.0.0.4:11434\n"
+               "  python scripts/generate_tests.py --validate tests/translation"
+    )
+    parser.add_argument(
+        "--count",
+        type=int,
+        default=1,
+        help="Количество тестов для генерации (по умолчанию: 1)"
+    )
+    parser.add_argument(
+        "--category",
+        type=str,
+        default="all",
+        choices=["translation", "summarization", "codegen", "all"],
+        help="Категория тестов (translation, summarization, codegen, или all) (по умолчанию: all)"
+    )
+    parser.add_argument(
+        "--model",
+        type=str,
+        required=True,
+        help="Название модели для генерации тестов (обязательный параметр)"
+    )
+    parser.add_argument(
+        "--ollama-url",
+        type=str,
+        required=True,
+        help="URL подключения к Ollama серверу (обязательный параметр)"
+    )
+    parser.add_argument(
+        "--validate",
+        type=str,
+        help="Валидировать тесты в указанной директории (например: tests/translation)"
+    )
+
+    args = parser.parse_args()
+
+    if args.validate:
+        print(f"🔍 Начинаю валидацию тестов в {args.validate}")
+        validate_all_tests(args.validate)
+    else:
+        print(f"🤖 Подключаюсь к Ollama серверу: {args.ollama_url}")
+        print(f"📝 Генерирую {args.count} тест(ов) для категории: {args.category}")
+        print(f"🎯 Используемая модель: {args.model}")
+
+        try:
+            ollama = OllamaClient(args.ollama_url)
+            generate_tests(ollama, args.model, args.count, args.category, "tests")
+        except Exception as e:
+            print(f"❌ Ошибка при генерации тестов: {e}")
+            sys.exit(1)
+
+    print("\n✨ Готово!")
+
+if __name__ == "__main__":
+    main()
--- a/tests/codegen/test2.json
+++ b/tests/codegen/test2.json
@@ -0,0 +1,4 @@
+{
+  "prompt": "Write a Python function that reverses a string.",
+  "expected": "def reverse_string(s):\n    return s[::-1]"
+}
--- a/tests/codegen/test3.json
+++ b/tests/codegen/test3.json
@@ -0,0 +1,4 @@
+{
+  "prompt": "Here's a simple Python programming task:\n\n**Task:** Write a Python function that checks if a given string is a palindrome or not. A palindrome is a word, phrase, number, or other sequences of characters that reads the same forward and backward (ignoring spaces, punctuation, and capitalization).\n\n**Function Signature:**\n```python\ndef is_palindrome(s: str) -> bool:\n    \"\"\"\n    Check if the given string `s` is a palindrome.\n\n    Args:\n        s (str): The input string to check.\n\n    Returns:\n        bool: True if `s` is a palindrome, False otherwise.\n    \"\"\"\n```\n\n**Example:**\n\n```python\nassert is_palindrome(\"racecar\") == True\nassert is_palindrome(\"hello\") == False\nassert is_palindrome(\"A man, a plan, a canal: Panama\") == True  # Ignoring spaces and punctuation\n```\n\n**Hint:** You can use the `str.lower()` method to convert the string to lowercase and the `re` module to remove non-alphanumeric characters.",
+  "expected": "```python\nimport re\n\ndef is_palindrome(s: str) -> bool:\n    \"\"\"\n    Check if the given string `s` is a palindrome.\n\n    Args:\n        s (str): The input string to check.\n\n    Returns:\n        bool: True if `s` is a palindrome, False otherwise.\n    \"\"\"\n    cleaned = re.sub(r'\\W+', '', s.lower())\n    return cleaned == cleaned[::-1]\n```"
+}
--- a/tests/summarization/test2.json
+++ b/tests/summarization/test2.json
@@ -0,0 +1,4 @@
+{
+  "prompt": "Summarize the following text in 1-2 sentences: 'The quick brown fox jumps over the lazy dog. The dog, surprised by the fox's agility, barks loudly. The fox continues running without looking back.'",
+  "expected": "A quick fox jumps over a lazy dog, surprising it. The fox keeps running while the dog barks."
+}
--- a/tests/summarization/test3.json
+++ b/tests/summarization/test3.json
@@ -0,0 +1,4 @@
+{
+  "prompt": "Summarize the following text in 1-2 sentences: 'In the realm of programming, machine learning algorithms enable computers to improve their performance on a specific task without being explicitly programmed for each step. These algorithms learn from data, allowing them to identify patterns and make predictions or decisions with increasing accuracy over time. For instance, deep learning models, which are part of artificial intelligence, use neural networks to process vast amounts of information, making significant advancements in areas such as image recognition and natural language processing. As technology advances, these capabilities are being integrated into various sectors, from healthcare to autonomous vehicles, transforming the way we interact with digital systems and enhancing our understanding of complex data sets.'",
+  "expected": "Machine learning algorithms allow computers to improve their performance on specific tasks through data-driven pattern recognition, leading to advancements in areas like image recognition and natural language processing, and being increasingly integrated into sectors such as healthcare and autonomous vehicles."
+}
--- a/tests/translation/test3.json
+++ b/tests/translation/test3.json
@@ -0,0 +1,4 @@
+{
+  "prompt": "Translate the following English text to Russian: 'What time is it right now?'",
+  "expected": "Который сейчас час?"
+}
--- a/tests/translation/test4.json
+++ b/tests/translation/test4.json
@@ -0,0 +1,4 @@
+{
+  "prompt": "Translate the following English text to Russian: 'What time is it right now?'",
+  "expected": "Который сейчас час?"
+}
--- a/tests/translation/test5.json
+++ b/tests/translation/test5.json
@@ -0,0 +1,4 @@
+{
+  "prompt": "Translate the following English text to Russian: '\"The sun is shining brightly.\"'",
+  "expected": "Солнце светит ярко."
+}