General Information

I was an employed researcher in the field of data science till February 14, 2024, and defended my dissertation "Leveraging Constraints for User-Centric Feature Selection" on January 20, 2025. In particular, my PhD research was about integrating constraints into feature selection for prediction models. Leveraging constraints, one may select features not only based on their predictive quality but also consider aspects like domain knowledge or interpretability. Thus, constraints can make feature selection more user-centric. 

Additionally, I had several collaborations with other researchers from data science and application domains like materials science, process verification, and SAT solving. Many of my teaching and research activities correspond to a GitHub project that contains the code and sometimes further materials. Further, I published Python packages for four of my research projects on PyPI:

  • alfese: Alternative feature selection - Find multiple feature sets (sequentially or simultaneously) that optimize feature-set quality while being sufficiently dissimilar to each other. Version 1.0.0 of the package supports five feature-selection methods.

  • cffs: Constrained (filter) feature selection - Optimize a linear feature-set quality function (univariate filter approach) while considering user constraints formulated in propositional logic and linear arithmetic.

  • csd: Constrained subgroup discovery - Subgroup discovery (1) without constraints, (2) with a limited number of features in the subgroup description, and (3) for finding alternative subgroup descriptions. Version 1.0.0 of the package supports seven subgroup-discovery methods.

  • kpsearch: K-portfolio search - Given the runtimes of multiple algorithms on multiple problem instances, find a subset (with predefined size k) of algorithms which is overall fastest if all algorithms are run in parallel on each instance (or, equivalently, if you have an oracle that always chooses the fastest solver per instance). Version 1.0.0 of the package supports seven portfolio-search methods.

My publication and the corresponding experimental data are listed in the following.

Publications


Leveraging Constraints for User-Centric Feature Selection. PhD dissertation
Bach, J.
2025, February 6. Karlsruher Institut für Technologie (KIT). doi:10.5445/IR/1000178649Full textFull text of the publication as PDF document
Towards Automatically Refining Low-Quality Domain Knowledge: A Case Study in Healthcare
Bielski, P.; Jendral, S.; Witterauf, L.; Bach, J.
2025. Machine Learning and Principles and Practice of Knowledge Discovery in Databases – International Workshops of ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Revised Selected Papers, Part III, 361–367, Springer Nature Switzerland. doi:10.1007/978-3-031-74633-8_25
Quantifying Domain-Application Knowledge Mismatch in Ontology-Guided Machine Learning
Bielski, P.; Witterauf, L.; Jendral, S.; Mikut, R.; Bach, J.
2024. Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024). Ed.: D. Aveiro. Vol. 2, 216–226, SciTePress. doi:10.5220/0013065900003838Full textFull text of the publication as PDF document
Knowledge-Guided Learning of Temporal Dynamics and its Application to Gas Turbines
Bielski, P.; Eismont, A.; Bach, J.; Leiser, F.; Kottonau, D.; Böhm, K.
2024. 15th ACM International Conference on Future and Sustainable Energy Systems, Singapur, 4th-7th June 2024, 279–290, Association for Computing Machinery (ACM). doi:10.1145/3632775.3661967Full textFull text of the publication as PDF document
Active Learning for SAT Solver Benchmarking
Fuchs, T.; Bach, J.; Iser, M.
2023. Tools and Algorithms for the Construction and Analysis of Systems. Ed.: S. Sankaranarayanan. Pt. 1, 407–425, Springer Nature Switzerland. doi:10.1007/978-3-031-30823-9_21Full textFull text of the publication as PDF document
Leveraging Constraints for User-Centric Selection of Predictive Features
Bach, J.
2022, October 6. AI Hub @ Karlsruhe (2022), Karlsruhe, Germany, October 5–7, 2022 Full textFull text of the publication as PDF document
An Empirical Evaluation of Constrained Feature Selection
Bach, J.; Zoller, K.; Trittenbach, H.; Schulz, K.; Böhm, K.
2022. SN Computer Science, 3 (6), Art.-Nr.: 445. doi:10.1007/s42979-022-01338-zFull textFull text of the publication as PDF document
Presentation for the Paper "A Comprehensive Study of k-Portfolios of Recent SAT Solvers"
Bach, J.
2022, August 2. 25th International Conference on Theory and Applications of Satisfiability Testing (SAT 2022), Haifa, Israel, August 2–5, 2022 Full textFull text of the publication as PDF document
A Comprehensive Study of k-Portfolios of Recent SAT Solvers
Bach, J.; Iser, M.; Böhm, K.
2022. 25th International Conference on Theory and Applications of Satisfiability Testing (SAT 2022). Hrsg.: Kuldeep S. Meel, 2:1–2:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (LZI). doi:10.4230/LIPIcs.SAT.2022.2Full textFull text of the publication as PDF document
Data-driven exploration and continuum modeling of dislocation networks
Sudmanns, M.; Bach, J.; Weygand, D.; Schulz, K.
2020. Modelling and simulation in materials science and engineering, 28 (6), Art. Nr.: 065001. doi:10.1088/1361-651X/ab97efFull textFull text of the publication as PDF document
Understanding the effects of temporal energy-data aggregation on clustering quality
Trittenbach, H.; Bach, J.; Böhm, K.
2019. Information technology, 61 (2-3), 111–123. doi:10.1515/itit-2019-0014
On the tradeoff between energy data aggregation and clustering quality
Trittenbach, H.; Bach, J.; Böhm, K.
2018. 9th ACM International Conference on Future Energy Systems, e-Energy 2018; Karlsruhe; Germany; 12 June 2018 through 15 June 2018, 399–401, Association for Computing Machinery (ACM). doi:10.1145/3208903.3212038

Teaching

I taught the exercises of "Data Science 1" (old name: "Big Data Analytics") three times and the practical course "Data Science Laboratory Course" (old name: "Analyzing Big Data Laboratory Course") five times. I fundamentally re-designed the exercises of "Data Science 1" when acquiring the "Baden-Württemberg Certificate for Teaching and Learning at University Level". Further, I supervised one project each for the courses "Software Engineering in Practice" ("Praxis der Softwareentwicklung"; project topic: "CS:Select -  A Game for Feature Selection in Machine Learning") and "Research Project" ("Praxis der Forschung"; project topic: "Automating SAT Solver Research"). Finally, I supervised three seminar ~, seven bachelor's ~, and three master's theses.

Courses
Title Type Semester
Practical course (P) SS 2023
Projektgruppe (Pg) SS 2022
Practical course (P) SS 2022
Projektgruppe (Pg) WS 21/22
Lecture WS 21/22
Practical course (P) SS 2021
Lecture WS 20/21
Practical course (P) SS 2020
Lecture WS 19/20
Seminar (S) WS 19/20
Seminar (S) SS 2019
Practical course (P) SS 2019
Vorlesung (V) WS 18/19
Practical course (P) SS 2018