Allgemeines

Ich war bis zum 14.02.2024 wissenschaftlicher Mitarbeiter im Themengebiet Data Science und verteidigte am 20.01.2025 meine Dissertation "Leveraging Constraints for User-Centric Feature Selection". Konkret beschäftigte ich mich im Rahmen meiner Promotion damit, Nebenbedingungen (Constraints) in die Merkmalsauswahl (Feature Selection) für Vorhersagemodelle zu integrieren. Solche Nebenbedingungen können helfen, Merkmale nicht nur anhand ihrer Vorhersagequalität auszuwählen, sondern auch weitere Aspekte wie Domänenwissen oder Interpretierbarkeit zu berücksichtigen und dadurch die Merkmalsauswahl Nutzer-zentrierter zu machen.

Weiterhin hatte ich Kooperationen mit anderen Forschenden im Bereich Data Science und in Anwendungsfeldern wie Materialwissenschaften, Prozessverifikation und SAT-Solving. Für viele meiner Forschungs- und Lehraktivitäten habe ich GitHub-Projekte erstellt, die den Code und teils noch weitere Materialien enthalten. Für vier meiner Forschungsprojekte habe ich außerdem Python-Pakete auf PyPI veröffentlicht:

alfese: Alternative feature selection - Find multiple feature sets (sequentially or simultaneously) that optimize feature-set quality while being sufficiently dissimilar to each other. Version 1.0.0 of the package supports five feature-selection methods.
cffs: Constrained (filter) feature selection - Optimize a linear feature-set quality function (univariate filter approach) while considering user constraints formulated in propositional logic and linear arithmetic.
csd: Constrained subgroup discovery - Subgroup discovery (1) without constraints, (2) with a limited number of features in the subgroup description, and (3) for finding alternative subgroup descriptions. Version 1.0.0 of the package supports seven subgroup-discovery methods.
kpsearch: K-portfolio search - Given the runtimes of multiple algorithms on multiple problem instances, find a subset (with predefined size k) of algorithms which is overall fastest if all algorithms are run in parallel on each instance (or, equivalently, if you have an oracle that always chooses the fastest solver per instance). Version 1.0.0 of the package supports seven portfolio-search methods.

Meine Publikationen und die zugehörigen Experimentaldaten sind im Folgenden gelistet.

Publikationen

Alternative feature selection with user control
Bach, J.; Böhm, K.
2025. International Journal of Data Science and Analytics, 20 (2), 1305–1327. doi:10.1007/s41060-024-00527-8

Poster for the Paper "Subgroup Discovery with Small and Alternative Feature Sets"
Bach, J.
2025, Juni 26. ACM SIGMOD/PODS International Conference on Management of Data (2025), Berlin, Deutschland, 22.–27. Juni 2025

Presentation for the Paper "Subgroup Discovery with Small and Alternative Feature Sets"
Bach, J.
2025, Juni 24. ACM SIGMOD/PODS International Conference on Management of Data (2025), Berlin, Deutschland, 22.–27. Juni 2025

Active Learning for SAT Solver Benchmarking – Extended and Revised Version
Fuchs, T.; Bach, J.; Iser, M.
2025. Journal of Automated Reasoning, 69 (3), 16. doi:10.1007/s10817-025-09729-6

Subgroup Discovery with Small and Alternative Feature Sets
Bach, J.
2025. Proceedings of the ACM on Management of Data, 3 (3), 221. doi:10.1145/3725358

Experimental Data for the Paper "Subgroup Discovery with Small and Alternative Feature Sets"
Bach, J.
2025, März 27. doi:10.35097/nftgaf7w73hy2491

Experimental Data for the Paper "Using Constraints to Discover Sparse and Alternative Subgroup Descriptions" (Version 2)
Bach, J.
2025, Februar 20. doi:10.35097/8ppb5x50nyvw1wa7

Leveraging Constraints for User-Centric Feature Selection. Dissertation
Bach, J.
2025, Februar 6. Karlsruher Institut für Technologie (KIT). doi:10.5445/IR/1000178649

Experimental Data for the Paper "Finding Optimal Diverse Feature Sets with Alternative Feature Selection" (Version 3)
Bach, J.
2025, Januar 28. doi:10.35097/4ttgrpx92p30jwww

Defense Presentation for the Dissertation "Leveraging Constraints for User-Centric Feature Selection"
Bach, J.
2025, Januar 20. Karlsruher Institut für Technologie (KIT)

Towards Automatically Refining Low-Quality Domain Knowledge: A Case Study in Healthcare
Bielski, P.; Jendral, S.; Witterauf, L.; Bach, J.
2025. Machine Learning and Principles and Practice of Knowledge Discovery in Databases – International Workshops of ECML PKDD 2023, Turin, Italy, September 18–22, 2023, Revised Selected Papers, Part III, 361–367, Springer Nature Switzerland. doi:10.1007/978-3-031-74633-8_25

Experimental Data for the Dissertation "Leveraging Constraints for User-Centric Feature Selection"
Bach, J.
2024, November 8. doi:10.35097/4kjyeg0z2bxmr6eh

Quantifying Domain-Application Knowledge Mismatch in Ontology-Guided Machine Learning
Bielski, P.; Witterauf, L.; Jendral, S.; Mikut, R.; Bach, J.
2024. Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024). Ed.: D. Aveiro. Vol. 2, 216–226, SciTePress. doi:10.5220/0013065900003838

Using Constraints to Discover Sparse and Alternative Subgroup Descriptions
Bach, J.
2024. arxiv. doi:10.48550/arXiv.2406.01411

Experimental Data for the Paper "Using Constraints to Discover Sparse and Alternative Subgroup Descriptions"
Bach, J.
2024, Juni 3. doi:10.35097/caKKJCtoKqgxyvqG

Knowledge-Guided Learning of Temporal Dynamics and its Application to Gas Turbines
Bielski, P.; Eismont, A.; Bach, J.; Leiser, F.; Kottonau, D.; Böhm, K.
2024. 15th ACM International Conference on Future and Sustainable Energy Systems, Singapore, 4th-7th June 2024, 279–290, Association for Computing Machinery (ACM). doi:10.1145/3632775.3661967

Experimental Data for the Paper "Alternative feature selection with user control"
Bach, J.
2024, März 21. doi:10.35097/1975

Experimental Data for the Paper "Finding Optimal Diverse Feature Sets with Alternative Feature Selection" (Version 2)
Bach, J.
2024, Februar 13. doi:10.35097/1920

Finding Optimal Diverse Feature Sets with Alternative Feature Selection
Bach, J.
2023. arxiv. doi:10.48550/arXiv.2307.11607

Experimental Data for the Paper "Finding Optimal Diverse Feature Sets with Alternative Feature Selection"
Bach, J.
2023, Juli 13. doi:10.35097/1623

Active Learning for SAT Solver Benchmarking
Fuchs, T.; Bach, J.; Iser, M.
2023. Tools and Algorithms for the Construction and Analysis of Systems. Ed.: S. Sankaranarayanan. Pt. 1, 407–425, Springer Nature Switzerland. doi:10.1007/978-3-031-30823-9_21

Leveraging Constraints for User-Centric Selection of Predictive Features
Bach, J.
2022, Oktober 6. AI Hub @ Karlsruhe (2022), Karlsruhe, Deutschland, 5.–7. Oktober 2022

An Empirical Evaluation of Constrained Feature Selection
Bach, J.; Zoller, K.; Trittenbach, H.; Schulz, K.; Böhm, K.
2022. SN Computer Science, 3 (6), Art.-Nr.: 445. doi:10.1007/s42979-022-01338-z

Presentation for the Paper "A Comprehensive Study of k-Portfolios of Recent SAT Solvers"
Bach, J.
2022, August 2. 25th International Conference on Theory and Applications of Satisfiability Testing (SAT 2022), Haifa, Israel, 2.–5. August 2022

A Comprehensive Study of k-Portfolios of Recent SAT Solvers
Bach, J.; Iser, M.; Böhm, K.
2022. 25th International Conference on Theory and Applications of Satisfiability Testing (SAT 2022). Hrsg.: Kuldeep S. Meel, 2:1–2:18, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (LZI). doi:10.4230/LIPIcs.SAT.2022.2

Experimental Data for the Paper ’’An Empirical Evaluation of Constrained Feature Selection"
Bach, J.; Zoller, K.; Schulz, K.
2022, Juli 26. doi:10.5445/IR/1000148891

Experimental Data for the Paper "A Comprehensive Study of k-Portfolios of Recent SAT Solvers"
Bach, J.; Iser, M.
2022, Mai 30. doi:10.5445/IR/1000146629

Experimental data for the paper "Analyzing and Predicting Verification of Data-Aware Process Models -- a Case Study with Spectrum Auctions"
Ordoni, E.; Bach, J.; Fleck, A.-K.
2022, Februar 14. doi:10.5445/IR/1000142949

Analyzing and Predicting Verification of Data-Aware Process Models – a Case Study with Spectrum Auctions
Ordoni, E.; Bach, J.; Fleck, A.
2022. IEEE Access, 10, 31699–31713. doi:10.1109/ACCESS.2022.3154445

Data-driven exploration and continuum modeling of dislocation networks
Sudmanns, M.; Bach, J.; Weygand, D.; Schulz, K.
2020. Modelling and simulation in materials science and engineering, 28 (6), Art. Nr.: 065001. doi:10.1088/1361-651X/ab97ef

Understanding the effects of temporal energy-data aggregation on clustering quality
Trittenbach, H.; Bach, J.; Böhm, K.
2019. Information technology, 61 (2-3), 111–123. doi:10.1515/itit-2019-0014

On the tradeoff between energy data aggregation and clustering quality
Trittenbach, H.; Bach, J.; Böhm, K.
2018. 9th ACM International Conference on Future Energy Systems, e-Energy 2018; Karlsruhe; Germany; 12 June 2018 through 15 June 2018, 399–401, Association for Computing Machinery (ACM). doi:10.1145/3208903.3212038

Lehre

Ich war dreimal Übungsleiter für die Veranstaltung "Data Science 1" (alter Name: "Analysetechniken für große Datenbestände") und fünfmal Leiter des Praktikums "Praktikum Data Science" (alter Name: "Praktikum: Analyse großer Datenbestände). Die Übung für "Data Science 1" gestaltete ich dabei im Rahmen des "Baden-Württemberg-Zertifikats für Hochschuldidaktik" grundlegend um. Weiterhin betreute ich je ein Projekt im Rahmen der Veranstaltungen "Praxis der Softwareentwicklung" (Thema: "CS:Select - Ein Spiel zur Merkmalsauswahl im maschinellen Lernen") und "Praxis der Forschung" (Thema: "Automating SAT Solver Research"). Außerdem betreute ich drei Seminar-, sieben Bachelor- und drei Masterarbeiten.

Lehrveranstaltungen
Titel	Typ	Semester
Praktikum: Analyse großer Datenbestände	Praktikum (P)	SS 2018
Praktikum: Analyse großer Datenbestände	Praktikum (P)	SS 2019
Recent Research Topics in Workflow Analysis, Privacy and Machine Learning	Seminar (S)	SS 2019
Praktikum: Analyse großer Datenbestände	Praktikum (P)	SS 2020
Praktikum Data Science	Praktikum (P)	SS 2021
Praktikum Data Science	Praktikum (P)	SS 2022
Praxis der Forschung (Projekt, 2. Semester)	Projektgruppe (Pg)	SS 2022
Praktikum Data Science	Praktikum (P)	SS 2023
Praxis der Softwareentwicklung (PSE)	Vorlesung (V)	WS 18/19
Recent Research Topics in Data Analysis, Privacy and Machine Learning	Seminar (S)	WS 19/20
Analysetechniken für große Datenbestände	Vorlesung (V)	WS 19/20
Analysetechniken für große Datenbestände	Vorlesung (V)	WS 20/21
Praxis der Forschung (Projekt, 1. Semester)	Projektgruppe (Pg)	WS 21/22
Data Science 1	Vorlesung (V)	WS 21/22