July 2022 – Present
Folsom, CA (remote)
- Implemented a modern Python development workflow including continuous integration for code auto-formatting and linting, documentation builds, and automated testing
- Reimplemented legacy Perl monitoring scripts in modern Python
- Implemented a machine learning solution to predict compute resource requirements for batch jobs to reduce wasted resources for jobs which over-estimate their needs and reduce the number of jobs killed for under-estimating their needs and overrunning them
- Implemented a system to scan log files and identify and categorize errors
and the pipeline stage in which they occurred
- The system is fully generalized, with individual scanners configurable via components defined in a YAML file
- Log files are compressed and saved alongside the analysis results. These files are usable in generated reports to accelerate debugging.