We are a small study group. We discuss probability, statistics, and data science.
You can join our group correlationzero.
All resources related to the discussed topics can be found here.
Some of our blogs.
A Novel Classification-Rejection Model [30th Mar 2025]: We often focus on classification, but sometimes, certain observations—despite belonging to a particular class—do not fully adhere to its properties. Identifying and analyzing these exceptions is crucial. This novel model performs classification as well as outlier detection simultaneously. The model outperformed existing approaches, such as Isolation Forest, across various benchmarks.
Analysis of Precipitation Extremes for Indian Cities [24th Jan 2025]: This is a project work where high-intensity weather events are modeled using approaches—block maxima and peaks over thresh- old. The Generalized Extreme Value (GEV) distribution and Generalized Pareto Distribution (GPD) were used to model the events and calculate return levels. The grouping of cities is done decade-wise based on exceeding the quartile threshold to track the stability and shifts of heavy rainfall in the cities.
Application of Skew-Normal Distribution in Portfolio Optimization under Bayesian Paradigm [27th Dec 2024]: We construct a skew-normal distribution using a normal one and use it for portfolio optimization. To be specific, we add extra skewness parameter to the efficient frontier model with all parameters estimated using Bayesian approach. In this way, we not only utilize the prior information about the stock returns, but also exploit the positive skewness in the stock return distribution.
Dynamic Time Warping [2nd Aug 2024]: Dynamic Time Warping (DTW) is a method to calculate distance or measure similarity between two time series when their lengths differ or they are out of phase. For example, imagine two people walking similarly but one is walking slower than the other. Distance metrics like euclidean or manhattan cannot be used in such cases for comparison. Since DTW is a distance measure, we can use it for time series clustering.
A New Correlation Coefficient [11th Apr 2024]: A correlation coefficient, denoted by \( \xi_{n}(X,Y) \) (xicor) has been introduced recently. This coefficient, being non-parametric, offers simplicity in calculation while possessing the capability to identify non-monotonic and non-linear relationships. Moreover, it exhibits some excellent asymptotic properties. We compared this new coefficient to the existing most popular correlation metrics.
Multithreading and Multiprocessing [28th Feb 2024]: Discussion around what exactly is multithreading and multiprocessing, what are the differences between them, and when and how to use them.
Simulated Annealing [8th Jan 2024]: Simulated Annealing is a method to find an approximate global optimum. Initially, we explore different solutions using a random walk procedure. But with more iterations, we try to settle down. Exploration is necessary to get out of the local optima. This method has a philosophical aspect as well. Initially, we explore many things in life, but eventually, we try to settle down.