The informal research seminar of the ALGA group. Talks last roughly 25 minutes, with five extra minutes allocated for discussion. Many presentations are focused on recent conference presentations, or practice talks for upcoming conferences. New members are often asked to give an overview of their field of research. Talks given by invited speakers may take up to 45–60 minutes including questions.
To be kept up-to-date about noon seminar presentations, please subscribe to the algoseminar-l mailing list.
This is the schedule for 2021. Schedules are available between 2005 and 2021.
Send an email to Alex to schedule your talk. No talks are planned at the moment.
An experimental analysis of $k$-means clustering in the sliding window model in two dimensional space is performed. We perform this experiment on algorithms presented by Ackermann et al. and Youn et al. for $k$-means clustering on streaming data and sliding window streaming. We experiment on the different parameters of the algorithms and decide on values which are ideal for a well distributed Gaussian dataset, to give a good trade-off between quality of clustering and space usage. We then go forward to also devise an algorithm which combines concepts from aforementioned algorithms and then analyze its parameters to find optimal values to use for clustering on a well distributed Gaussian dataset.
UPD: the talks have been moved online.
The talks are 15 minutes long. We schedule 45 minutes per talk to accommodate questions and feedback.
1. Bram Custers — Coordinated Schematization for Visualizing Mobility Patterns on Networks
2. Willem Sonke — Volume from Outlines on Terrains
Title: Coordinated Schematization for Visualizing Mobility Patterns on Networks
Abstract: GPS trajectories of vehicles moving on a road network are a valuable source of traffic information. However, the sheer volume of available data makes it challenging to identify and visualize salient patterns. Meaningful visual summaries of trajectory collections require that both the trajectories and the underlying network are aggregated and simplified in a coherent manner. In this paper we propose a coordinated fully-automated pipeline for computing a schematic overview of mobility patterns from a collection of trajectories on a street network. Our pipeline utilizes well-known building blocks from GIS, automated cartography, and trajectory analysis: map matching, road selection, schematization, movement patterns, and metro-map style rendering. We showcase the results of our pipeline on two real-world trajectory collections around The Hague and Beijing.
Title: Volume from Outlines on Terrains
Abstract: Outlines (closed loops) delineate areas of interest on terrains, such as regions with a heightened risk of landslides. For various analysis tasks it is necessary to define and compute a volume of earth (soil) based on such an outline, capturing, for example, the possible volume of a landslide in a high-risk region. In this paper we discuss several options to define meaningful 2D surfaces induced by a 1D outline, which allow us to compute such volumes. We experimentally compare the proposed surface options for two applications: similarity of paths on terrains and landslide susceptibility analysis.
Origami is the art of folding paper (derived from the words ‘oru’ which means folding and ‘kami’ which means paper). In origami, one tries to fold some shape from a single piece of paper without cutting the paper, moving through itself, stretching or shrinking the paper. In the map folding problem, given is an $m \times n$ regular grid paper, where the boundary of each cell is a crease with either a mountain or valley assignment. The problem is to decide whether all the creases can be folded flat. In this thesis we introduce and explore a new map folding variation: the equilateral triangle map folding. This variation changes the type of grid, and asks whether a hexagon paper on a triangular grid with assigned creases can be folded flat.
The constraint satisfaction problem (CSP) is a problem which asks whether it is possible to find a satisfying assignment for a number of variables such that a set of constraints is satisfied. While the complexity of any CSP depends on the types of allowed constraints, many have been observed to be NP-hard. Some well-known instances include q-Graph Coloring and $d$-CNF-SAT for $d > 2$. One way to more efficiently solve NP-hard CSPs is through kernelizations. A kernelization is a formalization of preprocessing the original problem to a simpler one without changing the solutions. We will explore how the size of the set of suitably preprocessed constraints for CSP depends on the number of variables and the types of allowed constraints. The work by Chen, Jansen and Pieterse showed how to construct a kernel with a constraint set of linear size in terms of the number of variables for CSPs over Boolean domains where the types of allowed constraints are so-called “balanced”. Their approach relied on representing the constraints by low-degree polynomials. By computing a basis for these polynomials it was then possible to sparsify the set of constraints. Earlier work only considered applying this method on Boolean domain CSPs however. In this thesis we will expand upon the method provided by Chen, Jansen and Pieterse over non-Boolean finite domains, and show how to identify which CSPs have a kernel of size $O(n^t)$ for $t > 1$, thus going beyond linear kernelization. For Boolean CSPs we show that when the allowed types of constraints are so-called $t$-balanced, it is possible to construct polynomials of degree $t$ which represent the constraints of the CSP. These polynomials can then be used to sparsify said CSP as mentioned before. A similar approach can be applied for non-Boolean CSP by first rewriting the constraints of such a CSP to an equivalent binary constraint and then applying a framework similar to that of the CSP over a Boolean domain. For a specific type of CSP, known as 3-Uniform Hypergraph 3-Coloring, we will show that it is unlikely for a kernel with $O(n^{3−ε})$ constraints for $ε > 0$ to exist. This type of CSP is similar to 3-Graph Coloring except that the graph is structured differently. In a 3-uniform hypergraph, each edge consists of 3 vertices instead of 2. In addition, a coloring is said to be a proper coloring if each edge contains at least 2 uniquely colored vertices. This can be thought of as a CSP where each constraint consists of 3 variables and each variable can be assigned a value from 1 to 3.
The talks are 20–25 minutes long. We schedule 50 minutes per talk to accommodate questions and feedback.
1. Pantea Haghighatkhah — Obstructing Classification via Projection
2. Aleksandr Popov — Uncertain Curve Simplification
3. Mart Hagedoorn and Max van Mulken — Dots & Boxes Is PSPACE-Complete
In this thesis, we present a reconfiguration algorithm for self-reconfigurable modular robots. The robot consists of cubic modules that are able to move by pivoting around neighbouring modules. We show that if a configuration of a modular robot is admissible, which means that it does not contain any of three forbidden patterns, then our algorithm can transform it into a straight line of modules. Using this intermediate state, our algorithm can transform a robot between any two admissible configurations in $O(n^3)$ moves. Previous work by Sung et al. [ICRA 2015] has shown that it is possible to transform a robot between any two admissible configurations but also required that the configurations only contain convex holes.
Pick-and-place machines put electronic components onto printed circuit boards. Scheduling component placements performed by the robots of such a machine can be seen as a Vehicle Routing Problem. Kulicke & Soffa (K&S) is currently developing a pick-and-place machine where each robot has two placement heads that can place components simultaneously, giving rise to the Paired-Vehicle Routing Problem. We present a novel algorithm for this problem and experimentally compare it to K&S's own prototype. By combining the best parts of both algorithms we can reduce the makespan of K&S's schedules by 9% on average.
The talk is ~15 minutes; we schedule 15 extra for questions and feedback.
We study metrics that assess how close a triangulation is to being a Delaunay triangulation, for use in contexts where a good triangulation is desired but constraints (e.g., maximum degree) prevent the use of the Delaunay triangulation itself. Our near-Delaunay metrics derive from common Delaunay properties and satisfy a basic set of design criteria, such as being invariant under similarity transformations. We compare the metrics, showing that each can make different judgments as to which triangulation is closer to Delaunay.
In this thesis, the problem of scale-dependent road generalization for the use in automatic cartography software based on OpenStreetMap data is discussed. The goal is to decide which roads should be shown such that we get an accurate and informative map while keeping it readable. Previous work does not provide a complete, computationally fast procedure that incorporates the simplification of complex highway interchanges and double roads, and works in different areas of the world. Current, efficient alternatives make a naive selection based on road classes. This thesis first investigates problems with such an approach and then describes a new computationally fast strategy that, using a stroke detection algorithm, combines a new thinning algorithm with a road selection algorithm. Maps are generated using this approach and evaluated both qualitatively and quantitatively. From this evaluation, we conclude that this strategy creates high-quality maps.
Research in various humanities fields, such as history of ideas, involves historical literature research. This starts with defining a corpus, i.e. a set of texts specified by bibliographical data, relevant to the research project and acquiring copies of these texts in order to analyze them.
Researchers get online access to increasingly large federated catalogs containing bibliographical data (such as WorldCat) and repositories of digitized texts (such as Google Books). A new approach to history of ideas, computational history of ideas, aims to employ these developments to enlarge the evidence basis for a wide-scope historical investigation.
A challenge this new approach faces is that the bibliographic data available online varies in quality and structure. This is problematic since humanities researchers, for example, must know precisely what edition of a book they are analyzing, and that they have identified all the books relevant to their research project.
This thesis presents a technical and user interface design of a framework that is created in collaboration with the Concepts in Motion team from the University of Amsterdam, representing the users of this framework. The framework supports researchers in the process of compiling text corpora using online catalogs and repositories. This way, bigger corpora with more accurate bibliographic data can be modeled and explored in a user-defined model. In addition the framework keeps a detailed history of activity within the framework. This allows researchers to investigate and ensure the legitimacy of conclusions drawn from results from the framework. Other features include discussions and access control. The framework also prepares for the future integration of other features such as automated analysis of texts, and applications in other fields.
Next to presenting a design, this thesis also involves the implementation of a subset of the design in a production-ready way. This allows future projects to use the results of this thesis to deploy the framework for use in actual research projects.
The quality of geometric networks like road networks is often measured based on distances: a good network should provide relatively short routes between the nodes of the network. In this thesis we study the robustness of an important class of geometric networks, namely geometric spanners. The vertex set of a geometric graph is a set of points in the plane (or some higher-dimensional Euclidean space) and the edges are straight line segments, weighted with the Euclidean distance between their endpoints. For some $t > 1$, such a network is a $t$-spanner if the length of the shortest path between any two vertices is at most $t$ times their Euclidean distance. The factor $t$ is called the dilation (or stretch factor, spanning ratio) of the network. Since the spanning ratio of the complete graph is always one, the challenge is to construct spanners with constant dilation that have as few edges as possible. There are several constructions of linear size spanners, which are optimal. Naturally, more edges are needed for spanners that can resist to failures. We give several constructions of spanners that can survive massive failures, while having only a few edges.
Thesis: https://pure.tue.nl/ws/portalfiles/portal/173826728/thesis
Grid maps are used to display data coupled with a visual, spatial information. Depending on the amount of data, the spatial information has to be distorted for the sake of the data's visibility. In a grid map, each spatial element is condensed into a simple tile (such as a square or hexagon) and then arranged such that the global shape of all tiles matches the global shape of the input with the least amount of spatial distortion, such that each tile is visually identifiable according to its spatial information. We analyze the state-of-the-art solution for generating such grid maps and identify avenues of improvement which we pursue in this paper. We improve two steps of this solution: the partitioning of an input map based on salient features and its conversion into a square tile mosaic cartogram. The main improvements over the state-of-the-art consist of guaranteeing a valid mosaic cartogram conversion and significant runtime improvement. We discuss our methodology, provide an implementation which we apply on one of the datasets used by the state-of-the-art solution, a map of mainland Netherlands and its municipalities, compare it to the state-of-the-art result and discuss further avenues of improvement for our implementation.
Anomaly detection is a classical topic in machine learning and data mining, with a wide range of applications. As the amount of information shared online grows rapidly, the need for parallelizable anomaly detection algorithms increases. In this thesis, we propose the Random Shift Forest (RSF) algorithm to detect anomalies. RSF is an unsupervised anomaly detection algorithm with a low time complexity. It is an ensemble method that uses partitioning to isolate anomalies. To evaluate RSF, we use it on synthetic and real datasets. Compared to a state-of-the-art algorithm, RSF performs favourably in terms of ROC–AUC. The straightforward construction of RSF makes it amenable to big data models. In particular, we show that RSF can be implemented in the distributed and the streaming model using sparse recovery techniques.
Trajectory segmentation is the problem of partitioning a trajectory into different movement phases. There is a large body of work on the problem of segmenting individual trajectories, such as Behavioral Change Point Analysis (BCPA) which fits a model to statistical features of trajectories to compute likely points of change in behavior. However, there is very little work on algorithms for segmenting a set of trajectories. The aim of this work is to generalize segmentation to multiple trajectories in a model-based context.
We define a graph-like representation of model-based segmentations and an information criterion that can evaluate the quality of these representations, based on their complexity and the models fit to the movement phases and breakpoints of these phases. Our representation is a directed acyclic graph, wherein edges represents a movement phase of a group of subtrajectories, and vertices represent changes in movement or grouping behavior. We generalize movement models including BCPA to this group setting, and formalize our optimization objective with these generalized models.
Furthermore, we present three heuristic algorithms to compute such segmentations. A clustering approach is presented which builds clusters of close subtrajectories with similar movement from a trajectory set and greedily selects a subset of the clusters to form a valid segmentation. We relate the cluster selection problem to Weighted Set Cover to provide context on the problem and our greedy solution. We also present an incremental method, where we build a complete segmentation for a trajectory by adding trajectories to the segmentation one by one. We define the subproblem of adding a trajectory to a segmentation and describe how our solution to this problem yields a complete segmentation. The last method we present solves the problem in two steps. This method first discovers which sets of subtrajectories are considered to be grouped, by computing the trajectory grouping structure, a graph that represents subgroups in a set of trajectories. These subgroups are then segmented separately using an dynamic programming algorithm that generalizes the existing algorithm for model-based segmentation of individual trajectories.
Finally, we present results obtained from experiments on synthetically generated trajectories and real data sets of moving animal and human entities. We used a simple implementation of the heuristic segmentation methods to run these experiments with different methods and models. We conclude our work with a discussion of the results and future work on model-based segmentation of trajectory groups.
Programmable matter is a concept in the field of distributed computing, it refers to a collection
of synthetic particles which can be programmed to change the physical properties of the material
that they form. Leader election is a common and important problem in distributed computing,
in which a single node must be chosen to lead the execution of a larger algorithm. In the context
of programmable matter, leader election is a significantly complex problem, both due to the large
numbers of particles in a system, and the storage and computational limitations of the individual
particles. In this thesis, several leader election algorithms for particle systems of programmable
matter are evaluated both experimentally and analytically. We consider the Amoebot model for
programmable matter, which describes how particles can move, compute, and interact with one
another. Four different leader election algorithms are implemented in AmoebotSim, a simulator
for the Amoebot model. Using experiments we determine which of these algorithms is fastest,
and what factors influence the running time of each algorithm. Furthermore, we investigate
the theoretical robustness and probability of failure of the algorithms where applicable. This
information should be helpful for the selection of a leader election algorithm on given types of
input particle systems, and may also help in the development of new algorithms with different
performance characteristics or assumptions.
EuroCG has 12-minute talks with 3-minute breaks for questions and switching. We schedule 15 minutes extra for feedback and inevitable technical issues.
EuroCG has 12-minute talks with 3-minute breaks for questions and switching. We schedule 15 minutes extra per person for feedback and inevitable technical issues.
1. Aleksandr Popov — Uncertain Curve Simplification
2. Bram Custers — Route Reconstruction from Traffic Flow via Representative Trajectories
3. Nathan van Beusekom — Crossing Numbers of Beyond-Planar Graphs Revisited
EuroCG has 12-minute talks with 3-minute breaks for questions and switching. We schedule 15 minutes extra for feedback and inevitable technical issues.
Upd: Pantea’s talk has been rescheduled to 6th April.
The aim of this thesis is to define to what extent the exact route driven can be determined from a GPS trajectory using a continuous hidden Markov model (CHMM). We present a short overview of the various map matching methods and a more extensive overview of the hidden Markov model (HMM). Current HMMs are discrete as they calculate with the closest position which is not necessarily the most optimal point. Thus, the CHMM defines the measurement and transition probability in a continuous way, by defining a probability distribution over all points. Using data from HERE Technologies we test the performance of the CHMM for various noise levels and sampling intervals. The CHMM outperforms some models for short sampling intervals and has a similar overall performance to others. We observe two main limitations. First, the CHMM fails to take into account the correct route when finding the shortest path between candidates if the sampling interval is too long. Second, if the distance traversed on the road is considerably larger than the distance between consecutive measurements, the transition probability gets too small and the model is unable to find the correct route. The latter limitation may be overcome by implementing a different projection technique for the transition probability. A different projection technique, presented in this thesis, uses time instead of distance, which requires a good estimation of the average velocity, as it is very sensitive to traffic conditions and the accuracy of the recorded velocity of the vehicle. The results show that, besides the limitations, the CHMM is a simple but effective continuous map matching method.