|Date and time
|18.10.2023 at 14:15, TU Dortmund, M/ E 21
|20.10.2023 at 9 am, TU Dortmund University, M/ E 27
Title: Causal Discovery under Scaled Noise: Identifiability and Robust Estimation
Alexander Marx, Post-Doctoral Fellow of the ETH AI Center, Zurich
|25.10.2023 at 14:15, TU Dortmund University, M/ E 21
|01.11.2023 at 14:15, TU Dortmund University, M/ E 21
|no seminar (All Saints' Day)
|08.11.2023 at 14:15, TU Dortmund University, M/ E 21
Title: High-Dimensional L2Boosting: Rate of convergence
Martin Spindler, Professor at the Faculty of Business Administration of University Hamburg with focus on Statistics with application in business administration
|15.11.2023 at 14:15, TU Dortmund University, M/ E 21
Title: Isotonic Distributional Regression and Total Positivity
Lutz Dümbgen, Professor for Statistics of University Bern
In regression analyses, one is often interested in estimating the full conditional distribution of a response, given a covariate X, rather than just its mean, median or other features. Under the assumption that the conditional distribution of Y, given that X = x, is in some sense monotone increasing in x, this problem is solvable. Here one could work with the usual stochastic order or the stronger likelihood ratio order. This talk highlights both paradigms. Furthermore we discuss connections between the likelihood ratio order and total positivity of bivariate distributions.
|22.11.2023 at 14:15, TU Dortmund, M/ E 21
Title: Statistical depth: Geometry of multivariate quantiles
Stanislav Nagy, Assistant Professor, Charles University Prague, Czech Republic
Statistical depth is a non-parametric tool applicable to multivariate and non-Euclidean data. Its goal is to reasonably generalise quantiles to multivariate and more exotic datasets. The first depth was proposed in statistics in 1975; rigorous investigation of depths started in the 1990s, and still, an abundance of open problems stimulates research in the area. We discuss two seminal depths: (i) the halfspace depth (Tukey, 1975) and (ii) the simplicial depth (Liu, 1988). We unveil surprising links between these depths and well-studied concepts from geometry and discrete mathematics. Using these relations, we partially resolve several open problems; in particular, the 30-year-old characterization conjecture, asking whether two different distributions can correspond to the same halfspace depth.
|29.11.2023 at 14:15, TU Dortmund, M/ E 21
|06.12.2023 at 14:15, TU Dortmund, M/ E 21
|Phd Student Colloquium
|13.12.2023 at 14:15, TU Dortmund, M/ E 21
Title: Prediction Bands for Functional Time Series
Stathis Paparoditis, Professor at the Department of Mathematics and Statistics of the University of Cyprus
A bootstrap procedure for constructing prediction bands for a stationary functional time series is proposed. The procedure exploits a general vector autoregressive representation of the time-reversed series of Fourier coefficients appearing in the Karhunen-Lo`eve representation of the functional process. It generates backward-in-time, functional replicates that adequately mimic the dependence structure of the underlying process in a model-free way and have the same conditionally fixed curves at the end of each functional pseudo-time series. The bootstrap prediction error distribution is then calculated as the difference between the model-free, bootstrap-generated future functional observations and the functional forecasts obtained from the model used for prediction. This allows the estimated prediction error distribution to account for the innovation and estimation errors associated with prediction and the possible errors due to model misspecification. We establish the asymptotic validity of the bootstrap procedure in estimating the conditional prediction error distribution of interest, and we also show that the procedure enables the construction of prediction bands that achieve (asymptotically) the desired coverage. Prediction bands based on a consistent estimation of the conditional distribution of the studentized prediction error process also are introduced. Such bands allow for taking more appropriately into account the local uncertainty of prediction. Through a simulation study and the analysis of two data sets, we demonstrate the capabilities and the good finite-sample performance of the proposed method.
|20.12.2023 at 14:15, TU Dortmund, M/ E 21
|Phd Student Colloquium
|24.01.2024 at 14:15, TU Dortmund, M/ E 21
Title: Testing for Dependence by Using Ordinal Patterns: an Introduction
Christian Weiß, Professor of Quantitative Methods in Economics of HSU Hamburg
About 20 years ago, ordinal patterns have been introduced as a simple, robust, and flexible tool for analyzing the serial dependence structure of univariate real-valued stochastic processes. If applied to continuously distributed processes, one can derive non-parametric tests of the null hypothesis that the process is independent and identically distributed. Recently, also more sophisticated tasks for dependence tests have been considered: serial dependence in a discrete-valued process, the sequential monitoring of serial dependence, cross-dependence in a multivariate process, and spatial dependence in a random field. This talk provides an introduction to the aforementioned topics together with illustrative examples, and it concludes by outlining perspectives for future research.
|Date and time
|19.04.2023 at 14:15, TU Dortmund, M/ E 21
|PhD Student Colloquium
|03.05.2023 at 14:15, TU Dortmund, M/ E 21
Title: Affine-equivariant inference for multivariate location under Lp loss functions
Dr. Alexander Dürre, Assistant Professor, Mathematical Institute, Leiden University, Netherlands
We consider the problem of estimating the location of a d-variate probability measure. The well known multivariate mean can be defined as minimizer of the expected squared Euclidean loss. Its respective estimator, the sample mean, is optimal under normality, but behaves poorly under heavy tails. In the onedimensional setting, the median is therefore often preferred if heavy tails cannot be ruled out. Contrary to the mean, it is defined as the minimizer of the expected absolute loss. Its intuitive multivariate generalization, the spatial median, minimizes the expected Euclidean loss. However, its estimator is not affine-equivariant which can lead to a very low efficiency. We propose a collection of Lp location estimators that minimize the size of suitable `-dimensional data-based simplices. For ` = 1, these estimators reduce to minimizers of empirical euclidean losses, whereas, for ` = d, they are equivariant under affine transformations. Irrespective of `, these estimators reduce to the sample mean for p = 2, whereas for p = 1, the estimators provide the spatial median and Oja median for ` = 1 and ` = d, respectively. Under very mild assumptions, we derive an explicit Bahadur representation and establish asymptotic normality for the new estimators. To allow for large sample size n and/or large dimension d, we introduce a version of our estimators relying on incomplete U-statistics. We also define related location tests and derive explicit expressions for the asymptotic power under contiguous local alternatives. Data applications illustrate the importance of the choice of ` and p.
|10.05.2023 at 14:15, TU Dortmund, M/ E 21
|PhD Student Colloquium
|24.05.2023 at 14:15, TU Dortmund, M/ E 21
Title: Optimal sensor location for spatiotemporal systems
Professor Dariusz Uciński, Institute of Control and Computation Engineering, University of Zielona Góra, Poland
For dynamic systems described by partial differential equations it is impossible to observe their states over the entire spatial domain. A typical example is air pollution which is modelled by a convection-diffusion equation. When an experiment is going to be made to estimate the unknown system parameters, a major decision problem is how to locate discrete sensors so as to get the most valuable information about these parameters. Both researchers and practitioners do not doubt that making use of sensors placed in an "intelligent" manner may lead to dramatic gains in the achievable accuracy of the parameter estimates, so efficient sensor location strategies are highly desirable. In turn, the complexity of the sensor location problem implies that there are very few sensor placement methods which are readily applicable to practical situations. What is more, they are little known among researchers. The aim of the talk is to give account of both classical and recent original work on optimal sensor placement strategies for parameter identification in spatiotemporal processes. The reported work concerns the development of new techniques and algorithms or adopting methods which have been successful in akin fields of optimal control and optimum experimental design. While planning, real-valued functions of the Fisher information matrix of parameters are primarily employed as the performance indices to be minimized with respect to the sensor positions. Particular emphasis is placed on determining the "best" way to guide moving and scanning sensors, and making the solutions independent of the parameters to be identified. A couple of case studies regarding the design of air quality monitoring networks are adopted as an illustration aiming at showing the strength of the proposed approach in studying practical problems.
|31.05.2023 at 14:15, TU Dortmund, M/ E 21
Title: Statistical Plasmode Simulations - Potentials and Challenges
Dr. Nicholas Schreck Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg
Statistical data simulation is essential in the development of statistical models and methods as well as in their performance evaluation. To capture complex data structures, in particular for high-dimensional data, a variety of simulation approaches have been introduced including parametric and the so-called plasmode simulations. While there are concerns about the realism of parametrically simulated data, it is widely claimed that plasmodes generate realistic data with some aspect of the "truth" known. However, there are no explicit guidelines or state-of-the-art on how to perform plasmode data simulations. We review existing literature and motivate and introduce the concept of statistical plasmode simulation. We discuss advantages and challenges of statistical plasmodes and provide a step-wise procedure for their generation, including key steps to their implementation and reporting. Throughout the talk, we illustrate the concept of statistical plasmodes as well as the proposed plasmode generation procedure by means of a public real RNA dataset on breast carcinoma patients.
|07.06.2023 at 14:15, TU Dortmund, M/ E 21
Title: Uniform Inference in High-Dimensional Additive Models
Prof. Dr. Jannis Kück, Professor of Economics, esp. Data Science in Economics, Düsseldorf Institute for Competition Economics (DICE), Heinrich Heine University Düsseldorf
We develop a method for uniformly valid confidence bands of a nonparametric component f1 in the additive model Y = f1(X1) + ... + fp(Xp) + e in a high-dimensional setting. We employ sieve estimation and embed it in a high-dimensional Z-estimation framework that allows us to construct uniformly valid confidence bands for the first component f1. Our study extends the existing results for inference in high-dimensional additive models and clarifies the required assumptions. In a setting where the number of regressors p may increase with the sample size, a sparsity assumption is critical for our analysis. Furthermore, we run simulation studies that show that our proposed method delivers reliable results concerning the estimation and coverage properties even in small samples. Finally, we illustrate our procedure in an empirical application demonstrating the implementation and the use of the proposed method in practice.
|14.06.2023 at 14:15, TU Dortmund, M/ E 21
Title: On the smoothed empirical distribution function
Prof. Dr. Henryk Zähle, Mathematics Department, Saarland University
In this talk, I consider a kernel-based smoothing of the empirical distribution function of a sample of size n. I will first present results on the existence and the exact rate of convergence to zero (as n → ∞) of a MISE-minimal bandwidth. I then discuss two databased choices of the bandwidth which turn out to be quite good and, in particular, lead to strongly consistent estimates of the unknown underlying distribution function. Finally, I also discuss the asymptotics of the corresponding (smoothed) empirical process.
|22.06.2023 at 16:45, TU Dortmund, CDI 120
Title: Automated Claim Detection for Assisting Fact Checkers
Sami Nenno, Alexander von Humboldt Institute for Internet and Society, Berlin
Disinformation and Fake News are not new phenomena but in recent years they became an increasing problem for public discourse and democracies around the world. Even though the number of fact checking organizations has increased as well, journalists often express the need for computational tools to handle the flood of disinformation. Accordingly, in computer science, and NLP especially, there has been vivid research on automating parts of the fact checking pipeline. Claim detection, the task of automatically retrieving textual claims that require checking, has received the most interest among fact checkers. However, often researchers in the field define the task as classifying "checkworthy" claims without critically discussing what checkworthiness actually means or involves. I review some attempts and datasets and highlight their major shortcomings. I argue for a different task design that takes inspiration from what is known as "news values" in communication science. News values are criteria that guide journalists in selecting events that are worth being reported. In a similar vein, detecting checkworthy claims can be modeled as classifying claims to truth that meet certain criteria that are relevant with regard to disinformation and fact checking. I am presenting an annotated dataset for classifying claims to truth, show some empirical results of models trained on it, and discuss how this task integrates into a broader pipeline for detecting checkworthy claims.
|09.08.2023 at 14:00, TU Dortmund, M/ E 21
Title: Optimistic bias in the evaluation of statistical methods: illustrations and possible solutions
Christina Sauer, PhD student at the working group Biometry in Molecular Medicine, Faculty of Medicine, LMU Munich
Statistical methods are a core element of empirical research, necessitating a comprehensive and unbiased evaluation of their strengths and weaknesses. Such evaluation should ideally be conducted not only by the method's authors but also by other researchers in subsequent comparison studies. In both cases, however, the high degree of flexibility in assessing method performance, including data set choice, competing methods, and evaluation criteria, can substantially impact the conclusions drawn for the investigated methods. In the worst case, this flexibility, combined with researchers' hopes and expectations, can lead to optimistically biased results. In this talk, I show an example of a benchmark study where systematic variations of benchmark components allow us to present any of the compared methods as superior or inferior as desired. Additionally, I discuss the results of a "cross-design validation experiment", where we select two methods designed for the same data analysis task, reproduce the original results, and then reevaluate each method based on the study design (i.e., datasets, competing methods, and evaluation criteria) that was used to show the abilities of the other method. Finally, I present an idea on how to make simulation studies more realistic and less prone to optimistic bias by systematically basing them on a sample of real data sets that were selected according to predefined inclusion criteria.