Commit Graph

18 Commits

Author SHA1 Message Date
bfd0ffd3a1 Update .gitignore to exclude .DS_Store files
Added .DS_Store to the ignore list to prevent macOS temporary files from being tracked in the repository. This follows the existing pattern of ignoring Python cache files and results directories.
2026-01-29 12:17:00 +03:00
a1343ffbea refactor: move TEST_SEPARATOR import to constants module
Moved TEST_SEPARATOR import from benchmarks.base to constants module in codegen, summarization, and translation benchmarks for better modularity and maintainability. This change improves code organization by centralizing constants in a dedicated module.
2026-01-27 01:07:54 +03:00
7cf34fd14b Refactor benchmark base class and update main function
- Removed unused imports and constants
- Simplified run method signature by swapping num_ctx and context_size parameters
- Added test case name logging for better traceability
- Updated main function to pass context_size to benchmark run method
- Improved code clarity and maintainability
2026-01-26 23:38:01 +03:00
25e0a2a96a Remove "Лог файл" column from report
Remove the "Лог файл" (Log file) column from the report generation as it's no longer needed. This simplifies the report structure and removes unused functionality.
2026-01-26 22:40:44 +03:00
ec038053ec feat: upd tests 2026-01-26 17:59:25 +03:00
432c292462 fix: correct context size argument name in logging
The commit corrects the argument name used for logging the context size from `num_ctx` to `context_size` to match the actual parameter name, ensuring accurate logging output. This change improves code consistency and makes the log messages more readable.
2026-01-26 15:29:02 +03:00
f60dbf49f1 feat: Add context size support for benchmarks and update example usage
This commit adds support for specifying context size when running benchmarks, which is passed to the Ollama client as the `num_ctx` option. The changes include:

- Updated the `run` method in the base benchmark class to accept an optional `context_size` parameter
- Modified the Ollama client call to include context size in the options when provided
- Updated the `run_benchmarks` function to accept and pass through the context size
- Added example usage to the help output showing how to use the new context size parameter
- Fixed prompt formatting in the summarization benchmark to use `text` instead of `task`

The changes enable running benchmarks with custom context sizes, which is useful for testing models with different context window limitations.
2026-01-26 15:27:37 +03:00
2048e4e40d feat: enhance summarization prompt and improve MongoDB test generation
- Updated summarization prompt to require Russian output and exclude non-textual elements
- Upgraded ollama dependency to v0.6.1
- Enhanced run.sh script to support both single record and file-based ID input for MongoDB test generation
- Updated documentation in scripts/README.md to reflect new functionality
- Added verbose flag to generate_summarization_from_mongo.py for better debugging
```

This commit message follows the conventional commit format with a short title (50-72 characters) and provides a clear description of the changes made and their purpose.
2026-01-23 03:49:22 +03:00
d8785ada8a feat: add test generation scripts and update documentation
- Added scripts directory with generate_tests.py for automated test generation
- Added prompts directory with category-specific prompts for test generation
- Updated README with documentation for test generation workflow
- Modified test data format to TXT with '=== разделитель ===' separator
- Enhanced documentation with sections on test generation, validation, and reporting
- Added detailed instructions for using the new test generation capabilities
2026-01-22 22:26:59 +03:00
2a04e6c089 docs: update test format documentation in README
Update documentation to reflect new TXT format with separator for summarization tests instead of JSON format. Clarify that expected field may be empty if summary generation fails.

feat: change test generation to TXT format with separator

Change test generation from JSON to TXT format with TEST_SEPARATOR. Add filename sanitization function to handle MongoDB record IDs. Update output path and file naming logic. Add attempt to generate expected summary through LLM with fallback to empty string.
2026-01-22 20:40:41 +03:00
2466f1253a feat: remove timestamp from report filenames for consistency
Previously, report filenames included a timestamp (e.g., `benchmark_20231015_143022.md`), which caused issues when regenerating reports as it would create duplicate files. The timestamp is no longer included in the filenames to ensure consistent naming and avoid overwriting conflicts. This change affects both benchmark and summary report generation in `src/utils/report.py`.
2026-01-22 20:20:46 +03:00
8ef3a16e3a feat: add MongoDB test generation and update dependencies
- Added pymongo==3.13.0 to requirements.txt for MongoDB connectivity
- Implemented generate_summarization_from_mongo.py script to generate summarization tests from MongoDB
- Updated run.sh to support 'gen-mongo' command for MongoDB test generation
- Enhanced scripts/README.md with documentation for new MongoDB functionality
- Improved help text in run.sh to clarify available commands and usage examples
```

This commit adds MongoDB integration for test generation and updates the documentation and scripts accordingly.
2026-01-22 20:11:52 +03:00
f117c7b23c doc: add test generation instructions and update run.sh
Added documentation for test generation through Ollama, including new command-line arguments for `generate_tests.py` and updated `run.sh` script. Also added a new `gen` command to `run.sh` for generating tests via Ollama. This improves usability by providing clear instructions and automation for test generation.
2026-01-17 02:40:38 +03:00
5c17378ce4 chore: add __pycache__ to global gitignore
Added **/__pycache__/ to .gitignore to prevent Python cache directories from being tracked across all directories. This improves repository cleanliness and reduces unnecessary files in version control.
2026-01-16 22:54:49 +03:00
774d8fed1d feat: add run.sh script and update documentation
- Added run.sh script with init, upd, run, and clean commands
- Updated README.md to document run.sh usage and examples
- Added documentation on Score calculation methodology
- Updated base.py to include score calculation logic
```

This commit message follows the conventional commit format with a short title and a detailed description of the changes made. It explains what was changed and why, making it clear and informative.
2026-01-16 22:30:48 +03:00
33ba55f4c1 feat: vibe coding rulezz 2026-01-16 22:30:30 +03:00
1a59adf5a5 feat: vibe code done 2026-01-16 19:58:29 +03:00
408f6b86c6 Initial commit 2026-01-16 16:13:12 +00:00