ai-benchmark

Author	SHA1	Message	Date
second_constantine	54cfe0d97b	docs: update README with context size parameter and dynamic benchmark discovery - Added `--context-size` parameter documentation with default value - Updated project structure to reflect dynamic benchmark discovery - Removed static benchmark module references - Added custom benchmark support documentation - Clarified automatic benchmark detection process - Updated test addition instructions for dynamic discovery - Fixed trailing newline in file ``` This commit message follows the conventional commit format with a clear title and descriptive body explaining the changes made to the documentation.	2026-02-28 13:10:48 +03:00
second_constantine	fcda2be4a9	refactor: update import paths and benchmark discovery Updated import paths to use direct module references instead of relative paths. Implemented dynamic benchmark discovery based on the contents of the tests/ directory, allowing for more flexible benchmark configuration without requiring code changes. This change improves maintainability and makes it easier to add new benchmarks.	2026-02-21 17:28:52 +03:00
second_constantine	0dc4359755	refactor: standardize prompt parameter naming and enhance benchmark base class - Changed all prompt parameters from '{text}' to '{input}' for consistency - Enhanced Benchmark base class with prompt loading and validation - Added test data loading functionality with proper error handling - Improved initialization to accept prompt path and test data directory - Added validation for prompt format and file existence - Implemented structured test data loading from directory ``` The commit message follows conventional commit format with a clear title and descriptive body explaining the changes and their purpose.	2026-02-14 23:35:44 +03:00
second_constantine	a62393a097	feat: less summarization tests	2026-02-14 22:47:49 +03:00
second_constantine	bfd0ffd3a1	Update .gitignore to exclude .DS_Store files Added .DS_Store to the ignore list to prevent macOS temporary files from being tracked in the repository. This follows the existing pattern of ignoring Python cache files and results directories.	2026-01-29 12:17:00 +03:00
second_constantine	a1343ffbea	refactor: move TEST_SEPARATOR import to constants module Moved TEST_SEPARATOR import from benchmarks.base to constants module in codegen, summarization, and translation benchmarks for better modularity and maintainability. This change improves code organization by centralizing constants in a dedicated module.	2026-01-27 01:07:54 +03:00
second_constantine	7cf34fd14b	Refactor benchmark base class and update main function - Removed unused imports and constants - Simplified run method signature by swapping num_ctx and context_size parameters - Added test case name logging for better traceability - Updated main function to pass context_size to benchmark run method - Improved code clarity and maintainability	2026-01-26 23:38:01 +03:00
second_constantine	25e0a2a96a	Remove "Лог файл" column from report Remove the "Лог файл" (Log file) column from the report generation as it's no longer needed. This simplifies the report structure and removes unused functionality.	2026-01-26 22:40:44 +03:00
second_constantine	ec038053ec	feat: upd tests	2026-01-26 17:59:25 +03:00
second_constantine	432c292462	fix: correct context size argument name in logging The commit corrects the argument name used for logging the context size from `num_ctx` to `context_size` to match the actual parameter name, ensuring accurate logging output. This change improves code consistency and makes the log messages more readable.	2026-01-26 15:29:02 +03:00
second_constantine	f60dbf49f1	feat: Add context size support for benchmarks and update example usage This commit adds support for specifying context size when running benchmarks, which is passed to the Ollama client as the `num_ctx` option. The changes include: - Updated the `run` method in the base benchmark class to accept an optional `context_size` parameter - Modified the Ollama client call to include context size in the options when provided - Updated the `run_benchmarks` function to accept and pass through the context size - Added example usage to the help output showing how to use the new context size parameter - Fixed prompt formatting in the summarization benchmark to use `text` instead of `task` The changes enable running benchmarks with custom context sizes, which is useful for testing models with different context window limitations.	2026-01-26 15:27:37 +03:00
second_constantine	2048e4e40d	feat: enhance summarization prompt and improve MongoDB test generation - Updated summarization prompt to require Russian output and exclude non-textual elements - Upgraded ollama dependency to v0.6.1 - Enhanced run.sh script to support both single record and file-based ID input for MongoDB test generation - Updated documentation in scripts/README.md to reflect new functionality - Added verbose flag to generate_summarization_from_mongo.py for better debugging ``` This commit message follows the conventional commit format with a short title (50-72 characters) and provides a clear description of the changes made and their purpose.	2026-01-23 03:49:22 +03:00
second_constantine	d8785ada8a	feat: add test generation scripts and update documentation - Added scripts directory with generate_tests.py for automated test generation - Added prompts directory with category-specific prompts for test generation - Updated README with documentation for test generation workflow - Modified test data format to TXT with '=== разделитель ===' separator - Enhanced documentation with sections on test generation, validation, and reporting - Added detailed instructions for using the new test generation capabilities	2026-01-22 22:26:59 +03:00
second_constantine	2a04e6c089	docs: update test format documentation in README Update documentation to reflect new TXT format with separator for summarization tests instead of JSON format. Clarify that expected field may be empty if summary generation fails. feat: change test generation to TXT format with separator Change test generation from JSON to TXT format with TEST_SEPARATOR. Add filename sanitization function to handle MongoDB record IDs. Update output path and file naming logic. Add attempt to generate expected summary through LLM with fallback to empty string.	2026-01-22 20:40:41 +03:00
second_constantine	2466f1253a	feat: remove timestamp from report filenames for consistency Previously, report filenames included a timestamp (e.g., `benchmark_20231015_143022.md`), which caused issues when regenerating reports as it would create duplicate files. The timestamp is no longer included in the filenames to ensure consistent naming and avoid overwriting conflicts. This change affects both benchmark and summary report generation in `src/utils/report.py`.	2026-01-22 20:20:46 +03:00
second_constantine	8ef3a16e3a	feat: add MongoDB test generation and update dependencies - Added pymongo==3.13.0 to requirements.txt for MongoDB connectivity - Implemented generate_summarization_from_mongo.py script to generate summarization tests from MongoDB - Updated run.sh to support 'gen-mongo' command for MongoDB test generation - Enhanced scripts/README.md with documentation for new MongoDB functionality - Improved help text in run.sh to clarify available commands and usage examples ``` This commit adds MongoDB integration for test generation and updates the documentation and scripts accordingly.	2026-01-22 20:11:52 +03:00
second_constantine	f117c7b23c	doc: add test generation instructions and update run.sh Added documentation for test generation through Ollama, including new command-line arguments for `generate_tests.py` and updated `run.sh` script. Also added a new `gen` command to `run.sh` for generating tests via Ollama. This improves usability by providing clear instructions and automation for test generation.	2026-01-17 02:40:38 +03:00
second_constantine	5c17378ce4	chore: add __pycache__ to global gitignore Added **/__pycache__/ to .gitignore to prevent Python cache directories from being tracked across all directories. This improves repository cleanliness and reduces unnecessary files in version control.	2026-01-16 22:54:49 +03:00
second_constantine	774d8fed1d	feat: add run.sh script and update documentation - Added run.sh script with init, upd, run, and clean commands - Updated README.md to document run.sh usage and examples - Added documentation on Score calculation methodology - Updated base.py to include score calculation logic ``` This commit message follows the conventional commit format with a short title and a detailed description of the changes made. It explains what was changed and why, making it clear and informative.	2026-01-16 22:30:48 +03:00
second_constantine	33ba55f4c1	feat: vibe coding rulezz	2026-01-16 22:30:30 +03:00
second_constantine	1a59adf5a5	feat: vibe code done	2026-01-16 19:58:29 +03:00
second_constantine	408f6b86c6	Initial commit	2026-01-16 16:13:12 +00:00

22 Commits