Glossary
Browse our comprehensive glossary of technical terms organized by expertise area and topic. This glossary covers AI, data science, research data management, compute infrastructure, and software engineering. Click on an Icon and then on any term to jump to a more detailed explanation.
Artificial Intelligence (AI)
Broad field of machine intelligence encompassing learning, reasoning, and problem-solving capabilities. We offer consultation and implementation support for AI solutions in research contexts. Our expertise spans from traditional machine learning over modern deep learning approaches to the newer LLMs and chatbots.
AI Tools & Ecosystem
AI tools and frameworks that accelerate research workflows through automation and intelligent analysis. We help you integrate modern AI solutions into your research, from setting up language models to deploying automated analysis pipelines.
- AI Agents : Autonomous software programs that take actions to achieve specific goals. Actions can be retrieving data, writing emails, communicating to other systems and more.
- DeepMind : AI research company known for developing advanced machine learning algorithms and systems. Deepmind was aquired by Google.
- Generative Pre-trained Transformer (GPT) : Type of large language model trained to generate human-like text responses. Most well-known the GPT Models from OpenAI.
- HuggingFace : Platform providing pre-trained machine learning models and tools for natural language processing. Mostly supporting open, free to use LLMs.
- LangChain : Framework for developing applications powered by large language models. Used by programmers or techincal users to make use of machine learning and AI.
- OpenAI : AI research and product company providing models and APIs (e.g., GPT) for language and multimodal tasks.
- Vector Databases : Specialized databases optimized for storing and querying high-dimensional vector data. Used by AI to store the semantics of text snippets.
Computer Vision
Techniques for extracting information and insights from images and videos using machine learning. Our team provides expertise in image processing, object detection, and pattern recognition.
- Image processing🔗 : Techniques for analyzing, enhancing, and manipulating digital images.
- Image Segmentation : Computer vision technique for partitioning images into meaningful regions or objects.
- Object Detection : Computer vision technique for identifying and locating objects within images or videos.
Ethics, Security and Privacy
Ethical considerations and societal impacts of research technologies, particularly AI. Many publisher have their own guidelines on what is allowed, we can provide guidance on responsible research practices, bias mitigation, and ethical implications of these technological solutions.
- Bias : Systematic errors or prejudices in data, algorithms, or decision-making processes. Just like humans, also machines can have biases like gender bias if the training data is skewed or training is not done carefully.
- Cybersecurity with AI : Application of artificial intelligence techniques to detect, prevent, and respond to cyber threats. AI is a tool that can be used by both sides.
- Data Protection Act : Legal framework complementing the GDPR for governing the processing and protection of personal data.
- Explainability : Ability to understand and interpret how machine learning models make decisions.
- General Data Protection Regulation (GDPR) : EU regulation governing the processing and protection of personal data. The corresponding Swiss regulation is the Federal Act on Data Protection (FADP).
- Handling of Sensitive Data : The legal and ethical obligations we have to handle sensitive data as well as the technical means to implement this.
- Transparency🔗 : Openness about research methods, data, and processes to enable verification and understanding. Specifically for AI this would include open model weights, but also open training method, open data collection and other things to inform on how the system was built.
Future Directions
AI is still a very dynamic technology with new tools and trends emerging also for research technology and computational methods. We track developing technologies and help researchers prepare for upcoming opportunities.
- AI Safety : Ensuring AI systems operate reliably, ethically, and without causing harm.
- Artificial General Intelligence (AGI) : Hypothetical AI that matches or exceeds human intelligence across all cognitive tasks.
- Human-AI Collaboration and Societal Impact : Designing how people work with AI systems and understanding the broader impacts on society, work, and decision-making.
- Quantum AI : Application of quantum computing principles to artificial intelligence and machine learning.
- Synthetic Data : Artificially generated data that mimics real data characteristics while preserving privacy.
Generative AI
AI systems that create new content, data, or solutions including text, images, and molecular designs. ChatGPT is a generative model that generates text. We provide expertise in choosing and finding use cases for generative AI. Our support covers model selection, deployment, and integration strategies.
- Diffusion Models : Generative AI models that create new data by learning to reverse a noise-addition process. The main technology behind generating images using AI.
- Large Language Models (LLMs) : AI models trained on vast text datasets to understand and generate human-like language.
- Text-to-Image : AI technology for generating images from textual descriptions.
- Text-to-Speech : Technology for converting written text into spoken audio output.
Machine Learning🔗
Algorithms and techniques that enable computers to learn patterns from data and make predictions or decisions. We provide comprehensive support for implementing machine learning solutions in research contexts. Our expertise spans from classical methods to modern deep learning approaches. Also have a look at the python learning material we offer on github.
- Machine Learning Operations (MLOps) : Practices for deploying, monitoring, and maintaining machine learning models in production. See also CI/CD.
- Model Explainability : Techniques for understanding and interpreting how machine learning models make predictions.
- Reinforcement/Un-/Supervised Learning : Machine learning paradigms determining how much human guidance if given during the training steps. The method is mostly chosen dependent on the type and amount of training data available.
Compute🔗
Computational resources and infrastructure for research including CPUs, GPUs, storage, specialized hardware and cloud resources. We help you find optimal compute solutions whether through CSCS supercomputers, cloud platforms, or local clusters. Our expertise covers resource allocation, software installation, and performance optimization.
Algorithms and Computation
Mathematical algorithms and computational methods for solving research problems efficiently. We provide expertise in selecting optimal algorithms for your specific use cases and help optimize computational performance. Our team assists with algorithm choices and implementation strategies.
- Computational Efficiency🔗 : Optimizing algorithms and systems to achieve maximum performance with minimal resource usage.
- Data Structures🔗 : Organized formats for storing and managing data efficiently in computer programs. This can improve performance, computation time but also human interpretability.
- Distributed Systems🔗 : Networked computing systems that coordinate activities across multiple machines to solve a shared large task.
- Graph Algorithms : Computational methods for analyzing and processing graph-structured data and networks.
- Optimisation : Techniques for finding the best solution among many possible alternatives. In software engineering this mostly means optimizing run time of a program, but also storage space needed, memory allocations done or other technical improvements.
- Parallel Computing : Simultaneous processing of tasks across multiple processors or computers.
- Simulations : Computational modeling of real-world processes for research and prediction purposes.
Cloud Computing🔗
On-demand access to computing resources including servers, storage, and specialized hardware like GPUs. We help you set up cloud environments for your research projects and find cost-effective solutions for temporary or scalable computing needs. Our team assists with cloud deployment and optimization strategies.
- Amazon Web Service (AWS) : Amazon Web Services cloud computing platform providing on-demand computing resources.
- Azure : Microsoft's cloud computing platform offering various services for storage, computing, and networking.
- Cloud Services🔗 : On-demand computing resources and applications delivered over the internet.
- Computational Science : Interdisciplinary field using mathematical models and computer simulations to solve scientific problems.
- Distributed Systems🔗 : Networked computing systems that coordinate activities across multiple machines to solve a shared large task.
- Google Cloud : Google's cloud computing platform offering various services for storage, computing, and machine learning.
- On-Demand Resources🔗 : Computing resources that can be provisioned and accessed when needed without long-term commitments.
- Performance Optimisation : Techniques for improving software speed, efficiency, and resource utilization.
- Scalability : Ability of systems to handle increased workload by adding resources dynamically or preallocated.
Code Optimisation
Techniques to improve runtime, memory use, and scalability through profiling and algorithmic or implementation improvements.
- Memory Management : Techniques for efficiently allocating and deallocating computer memory resources.
- Parallel Processing : Executing multiple operations simultaneously to improve computational efficiency.
- Performance Optimisation : Techniques for improving software speed, efficiency, and resource utilization.
- Profiling : Analyzing software performance to identify bottlenecks and optimization opportunities.
- Vectorisation : Process of converting data into vector representations suitable for machine learning algorithms.
Computing environment🔗
Software development environments and tools for research computing including IDEs, package managers, and virtual environments. We provide education and support for setting up optimal development environments. Our training covers Python environments, Jupyter notebooks, and development best practices.
- Dependencies🔗 : External libraries, packages, or modules that a software project requires to function properly.
- Integrated Development Environment (IDE)🔗 : Integrated Development Environments providing comprehensive tools for software development, bundeling essential tools like editor, compiler, debugger, and build tools.
- Package Managers🔗 : Tools for installing, updating, and managing software dependencies and libraries.
- Reproducibility🔗 : Ability to obtain consistent results using the same data and computational methods.
Computing infrastructure🔗
Large-scale computing systems including HPC clusters, supercomputers, and distributed computing platforms for research. We provide access to CSCS supercomputers and help configure computing infrastructure for your projects. Our expertise includes job scheduling, resource management, and performance optimization.
- Centro Svizzero di Calcolo Scientifico (CSCS) : Swiss National Supercomputing Centre providing high-performance computing resources for research.
- Clusters : Groups of interconnected computers working together as a single system for high-performance computing.
- High-Performance Computing (HPC) : High-Performance Computing systems designed for complex computational tasks requiring significant processing power.
- Job Scheduling : System for allocating and managing computational tasks across available computing resources.
- Supercomputers : High-performance computing systems capable of extremely fast processing for complex calculations.
Platform Engineering
Building internal platforms and paved paths that make it easier to deploy and operate software at scale.
- Automated Reconciliation : Continuous process of comparing desired vs actual system state and automatically correcting drift (common in GitOps and controllers).
- Container Orchestration : Tools and processes to schedule, scale, and manage containers across clusters, including health checks and rolling updates.
- GitOps Workflows : Operating infrastructure and apps via Git as the source of truth, with automated controllers applying the declared state.
- Kubernetes : Open-source container orchestration platform for deploying, scaling, and managing containerized applications.
- Policy as Code : Defining and enforcing operational and security policies using version-controlled, testable code instead of manual rules.
Storage Solutions
Approaches and services for storing research data reliably, including filesystems, object storage, backup, and lifecycle management.
- Archiving : Long-term storage and preservation of data for future access and compliance requirements.
- Backup : Creating copies of data to protect against data loss and enable recovery.
- Data Lifecycle🔗 : The entire process of data management, from creation to archiving or disposal.
- Data persistence : Ensuring data remains available and accessible over time through proper storage mechanisms.
- File Systems : Methods for organizing and storing data files on computer storage devices.
- Object Storage : Data storage architecture managing data as objects with metadata and unique identifiers.
Data Science🔗
Data Science is an interdisciplinary field focused on using scientific methods, processes, algorithms, and systems to extract knowledge on insights from data in various forms. Typically, it lies at the interception between computer science, math, statistics and domain knowledge and aims at turning raw data into actionable insights, scientific hypotheses and strategic decisions.
Computer Vision
Techniques for extracting information and insights from images and videos using machine learning. Our team provides expertise in image processing, object detection, and pattern recognition.
- Image processing🔗 : Techniques for analyzing, enhancing, and manipulating digital images.
- Image Segmentation : Computer vision technique for partitioning images into meaningful regions or objects.
- Object Detection : Computer vision technique for identifying and locating objects within images or videos.
Data Engineering🔗
Designing and building systems for collecting, storing, and processing large volumes of research data efficiently. We help create data pipelines and infrastructure that make your data accessible for analysis and AI applications. Our expertise includes Extract, Transform, Load (ETL)-processes, data warehousing, and real-time data processing.
- Data Lake : Centralized repository for storing large amounts of structured and/or unstructured data.
- Data Pipeline : Series of automated processes for moving and transforming data from source to destination.
- Data Quality : Measure of data accuracy, completeness, consistency, and reliability for research purposes and automated processes.
- Data Warehouse : Centralized repository optimized for analysis and reporting of integrated data from multiple sources.
- Ontologies : Formal representations of knowledge domains defining concepts and their relationships.
- Schemas : Structured formats defining the organization and constraints of data.
- Semantic Data : Data enriched with meaning and context to enable better understanding and interoperability.
Data Quality and Life Cycle🔗
Management of data throughout its entire lifecycle from collection to archiving, ensuring quality and accessibility at each stage following the FAIR principles. We provide guidance on data lifecycle planning and quality assurance processes. Our support includes DMP creation and data curation best practices. We work together with the 4RIs to share experince and unify processes.
- Archiving : Long-term storage and preservation of data for future access and compliance requirements.
- Data Collection : Systematic gathering of data according to established procedures and standards.
- Data Management Plan (DMP)🔗 : Data Management Plan outlining how research data will be handled throughout a project. This has become a requirement by funding organizations.
- FAIR principles : Guidelines making data Findable, Accessible, Interoperable, and Reusable.
- Long-term Preservation : Strategies for maintaining data accessibility and integrity over extended periods.
- Processing : Transforming and manipulating raw data into useful information for analysis.
- Reuse : Utilizing existing data, code, or resources for new research purposes or applications. Part of the FAIR principles
- Sharing : Making research data and resources available to others in the scientific community.
- Storage : Systems and methods for saving and organizing data for future access and use. See also Solutions for Storage
Exploratory Data Analysis🔗
Initial investigation of datasets to understand patterns, relationships, and anomalies through statistical analysis and visualization. We provide expertise in EDA techniques and help design effective data exploration workflows. Our support includes statistical testing and visualization strategies. We also offer courses for beginners and intermediates on python.
- Data visualisation : Graphical representation of data to communicate insights and patterns effectively.
- Distribution analysis : Examination of how data values are spread across different ranges or categories.
- Missing value analysis : Techniques for identifying, understanding, and handling incomplete data in datasets. Either for completing the data with artificial data, or to ignore it but without biasing the statistic.
- Outlier detection : Statistical techniques for identifying data points that significantly differ from the majority.
- Summary statistics : Descriptive measures that provide key insights about a dataset's central tendencies and spread.
Feature Engineering🔗
Process of creating meaningful variables from raw data to improve machine learning model performance. We help design and implement feature engineering pipelines for research datasets. Our expertise includes domain-specific feature creation and automated feature selection techniques.
- Dimensionality Reduction : Techniques for reducing the number of variables in data while preserving important information.
- Encoding Variables : Converting categorical or text data into numerical format suitable for machine learning algorithms.
- Feature Selection : Process of choosing the most relevant variables for building predictive models.
- Feature Transformation : Modifying or combining existing data features to improve model performance.
Generative AI
AI systems that create new content, data, or solutions including text, images, and molecular designs. ChatGPT is a generative model that generates text. We provide expertise in choosing and finding use cases for generative AI. Our support covers model selection, deployment, and integration strategies.
- Diffusion Models : Generative AI models that create new data by learning to reverse a noise-addition process. The main technology behind generating images using AI.
- Large Language Models (LLMs) : AI models trained on vast text datasets to understand and generate human-like language.
- Text-to-Image : AI technology for generating images from textual descriptions.
- Text-to-Speech : Technology for converting written text into spoken audio output.
Machine Learning🔗
Algorithms and techniques that enable computers to learn patterns from data and make predictions or decisions. We provide comprehensive support for implementing machine learning solutions in research contexts. Our expertise spans from classical methods to modern deep learning approaches. Also have a look at the python learning material we offer on github.
- Machine Learning Operations (MLOps) : Practices for deploying, monitoring, and maintaining machine learning models in production. See also CI/CD.
- Model Explainability : Techniques for understanding and interpreting how machine learning models make predictions.
- Reinforcement/Un-/Supervised Learning : Machine learning paradigms determining how much human guidance if given during the training steps. The method is mostly chosen dependent on the type and amount of training data available.
Signal Processing🔗
Techniques for analyzing and manipulating signals and time-series data to extract meaningful information. We provide expertise in filtering, transformation, and feature extraction from various types of scientific signals. Our support includes spectral analysis and noise reduction strategies.
- Feature Extraction : Process of identifying and extracting relevant characteristics or patterns from raw signal data.
- Filtering : Techniques for removing noise and unwanted components from signals to improve data quality.
- Fourier Transform : Mathematical transformation technique for analyzing frequency components of signals and data.
- Wavelets : Signal processing tool that provides time-frequency analysis of non-stationary signals.
Statistical Analysis🔗
Application of statistical methods to analyze research data and draw meaningful conclusions including hypothesis testing and modeling. We provide expertise in statistical design, analysis techniques, and interpretation of results. Besides our own expertise we also want to point out resources of the ETH-Domain which offers consulting services through the Seminar for Statistics(SfS)
- Analysis of Variance (ANOVA) : Statistical method for comparing means across multiple groups to determine if differences are significant.
- Correlation analysis : Statistical technique to measure and analyze the strength of relationships between variables.
- Descriptive statistics : Statistical measures that summarize and describe the main features of a dataset.
- Hypothesis testing : Statistical methods for testing assumptions and determining significance of observed effects.
- Nonparametric methods : Statistical techniques that don't assume specific probability distributions for the data.
Time Series Analysis🔗
Analysis of data points collected over time to identify trends, patterns, and make predictions about future behavior. We provide expertise in time series modeling, forecasting techniques, and trend analysis for research applications. Our support includes both statistical and machine learning approaches to temporal data.
- Autocorrelation : Statistical relationship between a time series and a delayed copy of itself over successive time intervals.
- Forecasting : Predicting future values or trends based on historical data and statistical models.
- Stationarity analysis : Statistical examination of whether data properties remain constant over time.
- Trend analysis : Examination of data patterns over time to identify directional changes and movements.
Research Data Management🔗
Comprehensive approach to organizing, storing, and preserving research data throughout its lifecycle following FAIR principles. We provide tools, training, and guidance for effective data management including openBIS implementation and DMP creation. Our services support compliance with funding agency requirements and institutional policies.
Data Governance and FAIR Principles🔗
Frameworks for managing research data according to FAIR principles—making data Findable, Accessible, Interoperable, and Reusable. We provide comprehensive guidelines and training for implementing effective data governance in your research. Our support includes policy development and compliance strategies.
- Access Management : Controlling and managing user permissions and access rights to data and systems.
- Citation Standards🔗 : Established formats and practices for properly citing and referencing research data and publications.
- Compliance🔗 : Adherence to laws, regulations, and institutional policies regarding data handling and privacy.
- Data Provenance : Record of data origins and processing history that enables traceability, reproducibility, and auditability.
- Findable, Accessible, Interoperable, Reusable (FAIR) : FAIR principles - Guidelines making data Findable, Accessible, Interoperable, and Reusable.
- Licensing : Legal frameworks governing the use, distribution, and modification of data and software.
- Open Access (OA)🔗 : Principle of making research publications freely available to all readers.
- Persistent Identifiers : Stable, globally unique identifiers (e.g., DOI, ORCID) that enable long-term referencing, citation, and linking of research outputs.
- Roles : Defined sets of permissions and responsibilities for different types of system users.
Data Quality and Life Cycle🔗
Management of data throughout its entire lifecycle from collection to archiving, ensuring quality and accessibility at each stage following the FAIR principles. We provide guidance on data lifecycle planning and quality assurance processes. Our support includes DMP creation and data curation best practices. We work together with the 4RIs to share experince and unify processes.
- Archiving : Long-term storage and preservation of data for future access and compliance requirements.
- Data Collection : Systematic gathering of data according to established procedures and standards.
- Data Management Plan (DMP)🔗 : Data Management Plan outlining how research data will be handled throughout a project. This has become a requirement by funding organizations.
- FAIR principles : Guidelines making data Findable, Accessible, Interoperable, and Reusable.
- Long-term Preservation : Strategies for maintaining data accessibility and integrity over extended periods.
- Processing : Transforming and manipulating raw data into useful information for analysis.
- Reuse : Utilizing existing data, code, or resources for new research purposes or applications. Part of the FAIR principles
- Sharing : Making research data and resources available to others in the scientific community.
- Storage : Systems and methods for saving and organizing data for future access and use. See also Solutions for Storage
Data Security, Ethics and Legal Compliance🔗
Ensuring research data handling complies with legal, ethical, and security requirements including GDPR, institutional policies, and research ethics. We provide guidance on data protection, anonymization techniques, and compliance frameworks. Our support covers licensing, access control, and ethical data use policies.
- Data Protection : Safeguarding data from unauthorized access, corruption, or loss through security measures.
- Ethical Use : Responsible application of data and technology according to moral principles and guidelines.
- Privacy : Protection of personal information and sensitive data from unauthorized access.
- Regulatory Compliance : Adhering to legal and regulatory requirements for data handling and research practices.
- Sensitive Data : Information requiring special protection due to privacy, security, or regulatory concerns.
Data Stewardship and Training🔗
Education and support for effective research data management practices and tools. We provide regular training sessions on RDM principles, openBIS usage, and data management best practices. Our programs help researchers develop skills for proper data stewardship throughout their careers.
- Best Practices in data management🔗 : Proven methods and techniques that consistently produce superior results in data management.
- Guidance : Directional support and advice for implementing best practices and solving problems.
- Support🔗 : Assistance and guidance provided to users for technical and procedural questions.
- Training🔗 : Educational programs and resources for developing skills in research data management.
Interoperability and Data Pipelines🔗
Systems and workflows that enable seamless data flow between different tools and platforms in the research ecosystem. We provide guidance on integrating openBIS, Renku, and other tools to create efficient research workflows. Our expertise includes API integration and automated data processing pipelines.
- Application Programming Interfaces (APls) : Interfaces and conventions that let software systems communicate, exchange data, and integrate services.
- Automation🔗 : Use of technology to perform tasks with minimal human intervention.
- PyBIS🔗 : Python library for programmatically accessing and manipulating openBIS data.
- Workflow Integration🔗 : Connecting different tools and processes to create seamless research workflows.
Metadata and Standards🔗
Structured information about data and adherence to domain-specific standards to ensure data findability and reusability. We provide guidance on metadata creation and help implement standardized data documentation practices. Our support includes domain standard selection and metadata schema development.
- Controlled Vocabularies🔗 : Standardized sets of terms used consistently to describe and categorize data.
- Metadata Schemas🔗 : Structured formats for organizing and describing metadata according to standards.
- Naming Conventions🔗 : Standardized rules for consistently naming files, variables, and data elements.
- Ontologies : Formal representations of knowledge domains defining concepts and their relationships.
Rdm Software and Infrastructure🔗
Software systems and technical infrastructure that support research data management workflows and processes. We provide expertise in setting up and configuring RDM tools including openBIS for data organization and storage. Our support includes data model design and system integration strategies.
- DMP-Tool🔗 : Software platform for creating, managing, and sharing data management plans.
- Electronic Lab Notebooks (ELNs) : Digital versions of traditional lab notebooks for recording and managing research activities. See also OpenBIS which is the most used ELN at Empa.
- Git : Distributed version control system for tracking changes in source code during development.
- Long Term Storage🔗 : Storage systems designed for permanent or extended data retention and archiving.
- OpenBIS🔗 : Open-source platform for managing biological and other research data with full data lineage.
- Renku : Platform for reproducible and collaborative research combining data, code, and workflows.
- Zenodo : Open repository platform for sharing and preserving research data and publications.
Software Engineering🔗
Systematic approach to developing, maintaining, and deploying software using engineering principles and best practices. We provide comprehensive software engineering support tailored for research contexts including code quality, testing, and documentation. Our expertise spans multiple programming languages and development methodologies. Specifically for python we offer courses semianually for beginners and intermediates.
Algorithms and Computation
Mathematical algorithms and computational methods for solving research problems efficiently. We provide expertise in selecting optimal algorithms for your specific use cases and help optimize computational performance. Our team assists with algorithm choices and implementation strategies.
- Computational Efficiency🔗 : Optimizing algorithms and systems to achieve maximum performance with minimal resource usage.
- Data Structures🔗 : Organized formats for storing and managing data efficiently in computer programs. This can improve performance, computation time but also human interpretability.
- Distributed Systems🔗 : Networked computing systems that coordinate activities across multiple machines to solve a shared large task.
- Graph Algorithms : Computational methods for analyzing and processing graph-structured data and networks.
- Optimisation : Techniques for finding the best solution among many possible alternatives. In software engineering this mostly means optimizing run time of a program, but also storage space needed, memory allocations done or other technical improvements.
- Parallel Computing : Simultaneous processing of tasks across multiple processors or computers.
- Simulations : Computational modeling of real-world processes for research and prediction purposes.
Code Optimisation
Techniques to improve runtime, memory use, and scalability through profiling and algorithmic or implementation improvements.
- Memory Management : Techniques for efficiently allocating and deallocating computer memory resources.
- Parallel Processing : Executing multiple operations simultaneously to improve computational efficiency.
- Performance Optimisation : Techniques for improving software speed, efficiency, and resource utilization.
- Profiling : Analyzing software performance to identify bottlenecks and optimization opportunities.
- Vectorisation : Process of converting data into vector representations suitable for machine learning algorithms.
Data Engineering🔗
Designing and building systems for collecting, storing, and processing large volumes of research data efficiently. We help create data pipelines and infrastructure that make your data accessible for analysis and AI applications. Our expertise includes Extract, Transform, Load (ETL)-processes, data warehousing, and real-time data processing.
- Data Lake : Centralized repository for storing large amounts of structured and/or unstructured data.
- Data Pipeline : Series of automated processes for moving and transforming data from source to destination.
- Data Quality : Measure of data accuracy, completeness, consistency, and reliability for research purposes and automated processes.
- Data Warehouse : Centralized repository optimized for analysis and reporting of integrated data from multiple sources.
- Ontologies : Formal representations of knowledge domains defining concepts and their relationships.
- Schemas : Structured formats defining the organization and constraints of data.
- Semantic Data : Data enriched with meaning and context to enable better understanding and interoperability.
Deployment🔗
Process of making research software available and operational in production environments with reliability and scalability. We provide expertise in containerization, CI/CD pipelines, and cloud deployment strategies for research applications. We coordinate IT resources for deployment.
- Cloud Deployment🔗 : Deploying applications and services to cloud computing platforms for scalability and accessibility.
- Containerisation : Packaging applications with their dependencies into portable, lightweight containers. Trying to avoid the "works on my computer" problem.
- Continuous Integration/Continuous Delivery or Deployment (CI/CD) : Continuous Integration/Continuous Deployment, a common term in software development covering practices for automated software building, testing, and deployment.
- DevOps : Practices that combine development and operations to automate delivery, improve reliability, and shorten release cycles.
- Monitoring : Continuous observation of system performance, health, and behavior in production environments.
- Production Environment🔗 : Live system where software applications serve real users and data. Aposed to e.g. a development environment meant as a playground where development doesn't distrub users.
- Release management🔗 : Process of planning, scheduling, testing and deploying software releases to production in a structured way.
Development🔗
Software development practices tailored for research contexts including rapid prototyping, collaborative development, and reproducible workflows. We provide guidance on development methodologies, code organization, and tool selection for research projects. Our expertise covers both web development and scientific computing frameworks.
- Debugging🔗 : Process of identifying, analyzing, and fixing errors, also called bugs, in software code.
- Dependency Management🔗 : Process of managing external libraries and packages that software projects depend on.
- Open Source🔗 : Software development model where source code is freely available for use and modification.
- Programming Languages🔗 : Formal languages for writing instructions that computers can execute. Examples include Python, Java, C++, C, Rust, Javascript, R, Fortran, MATLAB and many more.
- Python🔗 : High-level programming language widely used for scientific computing and data analysis. We offer a course semianually.
- Scaling🔗 : Adjusting computational resources to handle varying workload demands.
- Testing : Process of evaluating software functionality to ensure it meets requirements and quality standards. It is an integral part of modern software development
- Version Control (Git) : Version control is a system for tracking changes in files and coordinating work among multiple developers. Git is curretly the most used version control system for software development.
Research Software Engineering (RSE)🔗
At the intersection of traditional software engineering and scientific research this is a discipline that applies software engineering principles to research contexts, bridging the gap. This is the application of software engineering principles and practices to research contexts, addressing unique requirements like reproducibility and interdisciplinary collaboration.
- Intersection of traditional software engineering and scientific research : Applying software engineering practices to scientific code to improve quality, reproducibility, and sustainability.
Software Architecture🔗
Design principles and patterns for structuring complex software systems to meet research requirements including performance, scalability, and maintainability. We provide guidance on architectural decisions and system design for research applications. Our expertise covers both monolithic and distributed system architectures, both in the cloud and on premise solutions.
- Architectural Patterns🔗 : Reusable solutions to commonly occurring problems in software architecture design.
- Design Patterns : Reusable solutions to common problems in software design and architecture.
- Microservices : Architectural approach splitting a large programm or application into small, independent services that communicate over networks.
- Modularity🔗 : Design principle organizing code into separate, interchangeable components with defined interfaces.
- Scalability : Ability of systems to handle increased workload by adding resources dynamically or preallocated.
- System Design : Process of defining architecture, components, and interfaces for complex software systems.
Software Lifecycle🔗
Management of software development from conception through deployment and maintenance, adapted for research environments with evolving requirements. This is a particularly important topic in a research environment where we have a lot of flucuation in employees and without it software quickly becomes out of date and unmaintainable because of missing documentation. We provide guidance on lifecycle models, requirement management and version control strategies.
- Design : Planning and structuring software architecture, user interfaces, and system components.
- Documentation🔗 : Written materials that explain how to use, understand, or maintain software and systems.
- Expectation Management : Setting and communicating realistic expectations about project outcomes and timelines.
- Implementation : Process of converting designs and plans into working software code.
- Licensing : Legal frameworks governing the use, distribution, and modification of data and software.
- Maintenance : Ongoing activities to keep software systems functional, secure, and up-to-date.
- Open Source🔗 : Software development model where source code is freely available for use and modification.
- Requirements🔗 : Specifications defining what a software system should do and how it should perform without touching the technical implementation.
- Sustainability🔗 : Ensuring long-term viability and accessibility of research data and systems.
- Testing : Process of evaluating software functionality to ensure it meets requirements and quality standards. It is an integral part of modern software development