Thursday, September 11, 2014

Call for Nominations: EATCS Distinguished Dissertation Award 2014

The EATCS has established the EATCS Distinguished Dissertation Award to promote and recognize outstanding dissertations in the field of Theoretical Computer Science.

Any PhD dissertation in the field of Theoretical Computer Science that has been successfully defended in 2014 is eligible.

Three dissertations will be selected by the committee for year 2014. The dissertations will be evaluated on the basis of originality and potential impact on their respective fields and on Theoretical Computer Science.

Each of the selected dissertations will receive a prize of 1000 Euro. The award receiving dissertations will be published on the EATCS web site, where all the EATCS Distinguished Dissertations will be collected.

The dissertation must be submitted by the author as an attachment to an email message sent to the address giuper@gmail.com with subject EATCS Distinguished Dissertation Award 2014 by 31 December 2014. The body of the message must specify:

  • Name and email address of the candidate;
  • Title of the dissertation;
  • Department that has awarded the PhD and denomination of the PhD program;
  • Name and email address of the thesis supervisor;
  • Date of the successful defence of the thesis.
A five page abstract of the dissertation and a letter by the thesis supervisor certifying that the thesis has been successfully defended must also be included. In addition, the author must include an endorsement letter from the thesis supervisor and can include one more additional endorsement letters.

The dissertations will be selected by the following committee:
  • Javier Esparza
  • Fedor Fomin
  • Luke Ong
  • Giuseppe Persiano
The award committee will solicit the opinion of members of the research community as appropriate.

Theses supervised by members of the selection committee are not eligible.

The EATCS is committed to equal opportunities, and welcomes submissions of outstanding theses from all authors.

Monday, September 08, 2014

Report of TRENDS 2014 (Guest post by Ilaria Castellani and MohammadReza Mousavi)

Ilaria Castellani and MohammadReza Mousavi have kindly written this guest post reporting on TRENDS 2014, a satellite event of CONCUR 2014 organized by IFIP WG1.8 on Concurrency Theory. Enjoy it!

TRENDS 2014 is the third edition of the series of workshops organized by the IFIP Working Group 1.8 on Concurrency Theory. TRENDS traditionally comprises invited speeches by both promising young researchers and leading senior researchers to demonstrate the emerging trends in concurrency theory research. The workshop is followed by the business meeting of the working group, which is open to the public. This year's TRENDS was attended by 20 participants throughout the workshop and featured 4 excellent talks by the following first class speakers:
  1. Alexandra Silva gave an introduction to using co-algebras as a generic way of exploiting efficient algorithms in automata theory for minimization and equivalence checking of labelled transition systems (or different variants thereof) with respect to various notions of behavioural equivalence. In particular, she showed how Brzozowski's algorithm for minimization of finite automata can be used to minimize LTS's efficiently and also how Hopcroft and Karp's algorithm for checking language equivalence can be used to check equivalence of LTS's with respect to different notions of behavioural equivalence. She also presented some ideas about the application of learning algorithms for automata in the domain of concurrency theory. She concluded her talk by the following motto: Co-algebra is not only about semantics, but also about algorithms. Alexandra's slides are here.
  2. Simon Gay gave an overview of the history of session types and in particular remembered the legacy of the late Kohei Honda as the founding father of this field. He then pointed subsequent important developments in the field, such as the introduction of behavioural sub-typing by himself and the link to linear logic and the very interesting recent interpretation of the Curry-Howard isomorphism in the concurrency theory setting by Luís Caires, Frank Pfenning and associates. He also gave an overview of the future challenges in this field.
  3. Michele Bugliesi, who has recently been appointed as the Rector of the University of Venice, pinpointed security vulnerabilities in the current practice of authentication. Then, he presented the solutions devised by him and his associates through client-side protection of authentication cookies. The devised techniques were formally shown to guarantee security (provide authentication and integrity) on a formalization of the Firefox browser. The proof techniques use a noninterference-like notion, which exemplifies yet another application of concurrency theory.
  4. Stéphanie Delaune gave a presentation on the formal modelling and analysis of cryptographic protocols. She used a privacy issue in the old French electronic passport as a motivating example and showed how process algebraic formalisms, such as the applied pi-calculus, can be used in modelling the protocols used in such an e-passport (at an abstract level) and how behavioural equivalences (equipped with additional term equivalences on terms) can be used to verify these protocols. She pointed out several existing tools for this purpose, also developed within her group at ENS Cachan, and presented the challenges in her ongoing research.

Friday, August 29, 2014

Two recent events in Reykjavik: Crossroads of Art and Science and the 10th ICE-TCS Theory Day

The autumn semester 2014 started on Monday, 18 August, for us at the School of Computer Science at Reykjavik University (SCS). As usual at this time of the year, teaching-related matters took centre stage and occupied most faculty members, especially since we had just welcomed our third intake of well over 300 new computer science students. However, many of us at the Icelandic Centre of Excellence in Theoretical Computer Science (ICE-TCS) will remember the first week of the new academic year for two events showcasing the artistic and scientific work of Erik Demaine, who visited ICE-TCS  and the School of Computer Science at Reykjavik University in the period 20-27 August.

The first event, Crossroads of Art and Science, took place on Thursday, 21 August 2014, from 5pm till 7pm GMT, and was organized  by  ICE-TCS, SCS  and Vísindafélag Íslendinga (the Icelandic Academy of Sciences). It was attended by about 220 people, including a good number of our undergraduate students, and featured a keynote address by Erik entitled Folding Paper: Visual Art Meets Mathematics. 



Art and science are often viewed as very different, and somewhat antithetic, human endeavours, so much so that some artists happily profess ignorance of the basic methods and results of science and some scientists sneer at art. In 1959, this prompted British scientist and novelist C. P. Snow to express his view that "the intellectual life of the whole of western society" was split into two separate cultures---namely the sciences and the humanities---and that this was a major hindrance to solving the world's problems.

However, we can all mention key figures from the Renaissance and earlier times who were both artists and scientists. Is the breed of the Renaissance man a thing from a long gone past? The aim of Crossroads of Art and Science was to provide resounding evidence that the distinction between art and science in modern society is fictitious. We did so by showcasing three figures of polymaths whose work, be it artistic or scientific, benefits from the interplay of art and science.

Erik Demaine needs no introduction to the readers of this piece. He  helped start the field of Computational Origami, but he is also an artist, some of whose work has been exhibited at the MoMA, amongst other venues. He uses art to explore the feasibility of some of his mathematical ideas and mathematics to suggest artistic ideas, leading, for instance, to paper or glass sculptures. To quote Erik himself:
"One of our growing realizations over the years is that mathematics itself is an art form, and I think that's what attracted both of us to this area in the first place. [I]t has the same kind of creativity, the same kinds of aesthetics, that you get in art: You want a clean problem to solve, and you want an elegant solution to that problem. Both art and mathematics are about having the right ideas [and] executing those ideas in some convincing way, so in that sense they're indistinguishable." (See here for the video of that presentation.)
In case you have not done so already, do look at the inspirational video Erik produced when he received the Presburger Award 2013.



Crossroads of Art and Science also featured two Icelandic speakers. 
  • Anna Hrund Másdóttir is an artist and a mathematics teacher who teaches mathematics to support her art practice and whose teaching benefits from her artistic work. 
  • Kjartan Emilsson, a mathematical physicist by training, uses artistic methods of working and thinking in his work as Principal Game Designer and Studio Managing Director at CCP in Shanghai, where he directs the innovation and implementation of core virtual world game designs for CCP Games. 
The thesis underlying Crossroads of Art and Science was that art and science are not antithetic, and that their cooperation can lead to powerful ways for solving scientific problems and to creating new art forms. As Nicholasof Cusa put it in his philosophy, and as Anna Hrund wrote in the title of her presentation, we are in the presence of a "coincidence of opposites". We like to think that the people attending the event went home agreeing with that message.

If you understand Icelandic, you might enjoying listening to an excellent radio interview with Magnús M.  Halldórsson where he discusses the event and its theme with the host of the show. (The interview starts at around minute 15:40 and lasts for about 14 minutes.)

The second keynote address delivered by Erik Demaine was part of the 10th ICE-TCS Theory Day and was entitled Replicators, Transformers, and Robot Swarms: Science Fiction through Geometric Algorithms.. Erik's talk wasvery well attended and the audience included a good number of undergraduate students. During the talk, Erik presented some of his work that draws inspiration from science fiction and presented some answers to the following questions:
  1. How can we build reconfigurable robots like Transformers or Terminator 2? 
  2. How can we orchestrate the motion of a large swarm of robots? 
  3. How can we build Star Trek-style replicators that duplicate or mass-produce a given shape at the nano scale?
While addressing the first question, Erik gave us a whirlwind tour of his work on the equivalence of models for modular robots, hinged dissections (see this Wikipedia page for a brief introduction) and robotic paper. He also mentioned some of the work done in the context of the Millibiology project at MIT. Of course, the recently unveiled origami robot featured in this part of Erik's presentation, as well as some of its early incarnations.



In his discussion of problems related to robot swarms, Erik mentioned the distributed boundary detection algorithm he developed with James McLurkin, which is suitable for use on multi-robot systems with dynamic network topologies. The game of Tilt also featured in this part of Erik's talk, together with a sketch of the proof of its NP-hardness.




Finally, we were given a glimpse of some of Erik's fascinating work related to replicators. Two highlights were staged self-assembly of Wang tiles and shape replication in the Wang tile self-assembly model. Amongst other things, Erik showed the audience how to make infinitely many copies of any shape without holes in a single stage!

Last, but by no means least, the Theory Day 2014 included two presentations by members of ICE-TCS. Marjan Sirjani discussed some work done by her research group in Reykjavik and Tehran on analysis of network-on-chips using probabilistic timed actors. This work uses the language Rebeca and its tool set.  Christian Konrad closed the programme by presenting work on Robust Set Reconciliation, which was selected for presentation at SIGMOD 2014.

At the end of the scientific programme, the attendees mingled while being treated to chocolate and sparkling wine.

As you can see, the fun continued after dinner with an impromptu origami masterclass where people attempted to fold a hyperbolic paraboloid (Albers 1927). The instructions are here.





We held the first Theory Day in 2005, the day after the opening of ICE-TCS, and we have been having one such event every year since then.The list of foreign guests we have had so far for the Theory Day reads as follows:

  • 2005: Ryan Hayward (Alberta), Kim G. Larsen (Aalborg), Mogens Nielsen (Aarhus)
  • 2006: Wan Fokkink (VU Amsterdam), Jan Kratochvil (Prague), Moshe Vardi (Rice)
  • 2007: Thomas Erlebach (Leicester), Tommi Sottinen (Helsinki)
  • 2008: None, but we had all the ICALP invited speakers in July. 
  • 2009: Zoltan Esik (Szeged), Paul van Tilburg (Eindhoven)
  • 2010: David de Frutos Escrig and Carlos Gregorio-Rodriguez (both at the Universidad Complutense de Madrid).
  • 2011: None
  • 2012: None
  • 2013: Pierluigi Crescenzi (Firenze), Pierre Fraigniaud (Paris Diderot) and Stephan Holzer (ETH)
  • 2014: Erik Demaine (MIT)
We will see what the future will bring and look forward to celebrating the tenth birthday of ICE-TCS next year. Whatever we do, it will be hard to top the impact and national visibility of last week's events.

Sunday, July 13, 2014

Day 4 and Recap of ICALP 2014 (guest post by Andrew Winslow)

Here is the last guest post by Andrew Winslow  from ICALP 2014. Enjoy it! Thanks to Andrew for his conference reports. 

This [Friday, 11 July 2014] is the last day of ICALP!

George Mertzios, Sotiris Nikoletseas, Christoforos Raptopoulos and Paul Spirakis. Determining Majority in Networks with Local Interactions and very Small Local Memory

This talk focused on a problem in distributed computing on a graph with stateful nodes. Given a graph with nodes colored red and blue, a population protocol is a set of rules that define how the colors of a pair of endpoints of an edge can be transformed. For instance, a population protocol might include a rule to turn the colors of a pair of red nodes sharing to both be blue. One is also allowed to add additional colors not found in inputs, and the total number of colors that appear in the input and population protocol is the number of states of the protocol.
A population protocol for the majority problem transforms the colors of all (or all but a (1-ε) fraction) of the nodes into the color appearing most frequently in the input graph. In may be possible at any point during the execution of the protocol that more than one rule can be applied to the endpoints of more than one edge, and it is assumed that either the rules are chosen randomly (a randomized scheduler) or advesarially subject to some basic requirements, like that any rule that can be applied is applied after a finite amount of time (a fair scheduler). One can also ask about the amount of time (i.e. the number of rule applications) needed for a majority to be reached.

It was previously known that a 3-state protocol for majority exists in the case that both the graph is a clique and the ratio of the occurrence counts of most common color over least common color is ω(sqrt(n)log(n))[1]. In this work, Mertzios, Nikoletseas, Christoforos, and Spirakis constructed a four-state protocol that correct computes majority with probability 1 on any graph and with any difference between the number of occurrences of the two colors, assuming a fair scheduler. They also prove that the 3-state protocol described in [1] is strictly worse, failing to converge to the correct majority with high probability for an infinite family of graphs (graphs consisting of a clique at the end of a chain), even with a probabilistic scheduler.

[1] D. Angluin, J. Aspnes, and D. Eisenstat. A simple population protocol for fast robust approximate majority. Distributed Computing, 21(2): 87-102, 2008.

Karl Bringmann, Fabian Kuhn, Konstantinos Panagiotou, Ueli Peter and Henning Thomas. Internal DLA: Efficient Simulation of a Physical Growth Model.

Karl gave the talk. This work is about improving the speed of sampling the results of the following process, called internal diffuse limited aggregation or IDLA:
  1. Place a particle at origin in the square lattice.
  2. Take a random walk with this particle until a lattice location not containing another particle is found.
  3. Place this particle at the empty location.
As one might expect, the resulting shape of the particle locations is usually roundish. The roundness can be captured by considering the ratio of the radii of the smallest containing and largest contained circles for the shape, centered at the origin. It is known that the expected difference in the ratios of these circles is O(log(n)), i.e. that the shape is usually quite round.
The IDLA process comes from statistical physics, and natural problems in this area related to understanding the distribution of shapes generated by this process. The naive approach to randomly sample the result of the IDLA process takes O(n^2) time, since the expected radius is Theta(n^0.5), a randomly walk from the origin takes Theta(n) expected time to reach an empty location (usually on the boundary), and n particles must be simulated.
Bringmann, Kuhn, Panagiotou, Peter, and Thomas improve this to O(nlog^2(n)) expected time and O(n^0.5*log(n)) expected space with high probability. The idea of the algorithm is to efficiently carry out the random walk by making large jumps, carrying out a significant portion of the walk in O(1) expected time. An upper bound on the number of jumps can then be obtained by computing the time expected to leave the region containing previous points (with radius O(sqrt(n)*log(n))).

Recap

This was my first time attending ICALP, and overall had a great experience. The conference was well-organized and the work presented was consistently very high quality and also had a great diversity to it; every session had a fairly distinct theme. One of the more unique aspects of the conference are the three topical tracks, and the mixing and interaction between "Track-A people" and "Track-B people" (and don't forget Track-C people...) led to some great discussions.

Friday, July 11, 2014

July 9, 2014 - July 10, 2014 ICALP (guest post by Clément Canonne)

Here is the second guest post from ICALP 2014 kindly written by Clément Canonne. Enjoy it! (This  post is also available in PDF. Any typo or typesetting issue with the HTML version is my responsibility.)

Fast Algorithms for Constructing Maximum Entropy Summary Trees (Howard Karloff)

This talk [1] revolved around a question. A simple question, really. “How to draw a rooted tree?” Not any tree, though: a big one. A huge, gigantic n-node tree T, with n » ⌊222.891112⌋ [2], on a tiny sheet of paper (or a screen, for that matter). Most of the time, the approach is either to
  • draw the whole thing, get a binocular and hope something can be seen at all;
  • go for interaction, and draw/zoom on parts of the graph on-the-fly.
Here, none of this: T is T, magnificent n-node weighted tree; and we are given a parameter k ≥ n, to be though of as k « n; the task is to draw a k-node tree T, the “best summary” of T. (At this point – a demo! From the Mathematical Genealogy Project: the tree being made of the academic descendants of a obscure mathematician, someone named Gauss. That’s a hell of a tree.)
The key of their approach is not to draw only k carefully chosen nodes of T, and dismiss the others; but instead to allow T to combine subsets of T’s nodes into a big, meta-node named “others” (†).
Sure. But what is a “good summary” anyway? Quite naturally, the authors went for entropy – a tree is good if, assigning weights to nodes in the straightforward fashion (for v ∈ T, ωv ∝ the total weight of original nodes of T in the subtree of T rooted at v), the entropy


is big. That is, if T is as balanced as possible, or phrased differently remains very informative.

So, what’s the problem? The problem is exactly (†), when it comes to computing such an optimal k-node summary tree. Indeed, if we allow an algorithm to combine an arbitrary subset of a node v into a metanode othersv (and we do!), there is the slight problem of having exponentially many possibilities – and that could turn out pretty bad. An intuitive idea [3] would be to sort v’s children by non-decreasing weights, and combine together a prefix (the first, say, ℓ of them) – they shouldn’t be too important anyway. Sadly, it does not work – there are counter examples, even for n = 7 and k = 4.

In a previous work [4], Kenneth Shirley and Howard Karloff managed to give a pseudopolynomial algorithm (polynomial in the total weight), and a polynomial-time (additive) approximation one. This works came for the kill, removing the “pseudo” to give
  • a O(k2n + nlog n)-time algorithm for computing an optimal k-node summary tree;
  • a (better) O(n + poly(k,1∕ε))-time approximation algorithm
as well as a faster (albeit non-provably optimal) greedy algorithm. The crux of the approach? Prefixes don’t work, alright. But extended prefixes do! Instead of (after sorting a node v’s children) merging the nodes {1,…,ℓ}, it is sufficient to merge the {1,…,ℓ,ℓ} for some ℓ ≥ ℓ + 1 (yes, it is surprising to me too).
And voilà! Polynomial time.

Testing Equivalence of Polynomials under Shifts (Rafael Mendes de Oliveira)

And now, some testing [5]! Rafael has a device: a blackbox computing a n-variate polynomial P ∈ F[X1,…,Xn] (for some fixed field F). And another one – computing Q ∈ F[X1,…,Xn]. On inputs x ∈ Fn, these two devices allow him to compute P(x), Q(x) in constant time.

Rafael also has two coauthors, Zeev Dvir and Amir Shpilka, and a question:
Question (Shift Equivalence Testing (SET)). Given a bound on their degree, decide if P and Q are morally the same polynomial. That is, if there is a shift vector a such that P(⋅ + a) = Q(⋅) (and if so, please, find one.)
This is arguably a natural generalization of the infamous Polynomial Identity Testing (PIT) problem, where the goal is to decide whether P is the all-zero polynomial or not. PIT does not have any known efficient deterministic algorithm, but a very simple polynomial-time randomized one; and the task of improving this state of affairs has ramifications in many areas.
So SET must be difficult, since it not only generalizes the question, but also asks for an explicit witness – this vector a. After giving some background on what is known about related questions (is P “shift-equivalent” to a polynomial that can be represented with very few non-zero coefficients, for instance), Rafael gave the main result of their paper:
Theorem. There is an efficient (randomized) algorithm for SET. Furthermore, the only randomized part comes from solving instances of PIT.
In other words, in spite of what it seems, SET is not much harder than PIT; and any improvement on the use of randomness for PIT directly translates to the corresponding improvement for SET!
Roughly speaking, the idea of the proof is to try to find a “one step at a time”, by successively coming up with candidates ad, ad-1, …, a1. How? Find ad that matches the degree-d part of P(⋅ + a) and Q(⋅) – this is just a linear system to solve, along with a call to a randomized subroutine for PIT. Then find ad-1 that matches both the degree-d and degree-(d - 1) parts of P(⋅ + a) and Q(⋅) – this no longer linear, alas. But, and that’s the trick, one can plug the known value of ad in the quadratic system to make it collapse to a linear system, which has the same solutions as the quadratic one!
And so on, and so forth: after computing ak+1, one just has to solve a linear system to get ak, using the nice properties of ak+1 (simplifications “trickles down”), before calling PIT to make sure the result is correct so far. And at the end of the day, either a1 is a good shift-vector, or there is none – and a final call the PIT subroutine lets us know which of the two cases hold.
Hop.
  • [1] Fast Algorithms for Constructing Maximum Entropy Summary Trees, R. Cole and H. Karloff. ICALP, 2014.
  • [2] This is a prime number. Check it out.
  • [3] Rule of thumb: when I write “intuitive”, it is a trap.
  • [4] Maximum Entropy Summary Trees, K. Shirley and H. Karloff, Computer Graphics Forum. 2013.
  • [5] Testing Equivalence of Polynomials under Shifts, Z. Dvir, R.M de Oliveira, and A. Shpilka. ICALP, 2014.

Thursday, July 10, 2014

Day 3 of ICALP 2014 (guest post by Andrew Winslow)

Here is the third guest post by Andrew Winslow  from ICALP 2014. Enjoy it!

Invited Talk: Sanjeev Arora

Sanjeev gave a survey of recent work by himself and others in working to develop provable bounds on the performance of many problems across unsupervised learning. The talk itself was entitled "Overcoming Intractability for Unsupervised Learning", where the "unsupervised" refers (loosely) to being given a pile of raw data and asked to extra some relevant statistical data, e.g. given a collection of newspaper articles, what topics do they cover. The foothold for these problems is assuming that the data comes from some (unknown) distribution, and many fo the problems relate to learning something about the distribution.

Almost all of the problems covered are NP-hard, and the first part of the talk was an explanation of how this is overcome. In Sanjeev's words, "NP-hardness is not the right way to think about this". He explained that the starting point is determining if the set of interesting/realistic/natural instances lie in a tractable subset by looking for realistic assumptions that lead to tractable instances.
As an example, he described the topic modelling problem, where one is given a corpus of text (e.g. all New York Times articles) and the goal is to describe each article as a linear combination of topics, where the topics are not given in advance (i.e. unsupervised learning). A geometric view of this problem is being given a set of points in a convex polytope (articles) and being asked to find the vertices of the polytope (topics). The problem is NP-hard in general, but can be made tractable by assuming that the data is separable: the vertices are included in the set of points given. Then a polynomial time algorithm have be obtained: for each point, test if it's contained on the convex hull, if not, it can be removed, and otherwise it must be a vertex.

He also spoke about making process on other machine learning techniques, including dictionary learning (given complex objects built from a small, simple dictionary, compute the dictionary) and approaches like deep learning (many-level neutral networks). Dictionary learning and deep learning are both thought to take place in the human visual cortex, giving good evidence that their worst-case intractability is not a barrier to their effective use. For the case of deep learning, Sanjeev pointed out that learning the network is effectively trying to learn a TC^0 (ouch!) and yet they have been used successfully by machine learning people to achieve high accuracy rates on large corpora.

Amir Abboud, Virginia Vassilevska Williams and Oren Weimann. Consequences of Faster Alignment of Sequences.

Amir gave this talk on conditional lower bounds on local matching a pair of strings: given a function defining a score for every pair of symbols, find a pair substrings from the input string that maximizes the score. Like other string problems, including longest common subsequence, the best known algorithm for local alignment takes quadratic time and has resisted improvements in the exponent to, say, O(n^1.99).
Amir was quick to point out local alignment is hugely important in genomic sequence analysis (heuristics like BLAST are hugely popular) and that quadratic is too slow in these cases, since n can be several billion. Abboud, Williams, and Weimann prove that a O(n^{2-ε})-time algorithm for local alignment implies the following results:
  • 3SUM solved in O(n^{2-δ}) time for some δ, refuting the weak form of the 3SUM conjecture.
  • CNF-SAT solvable in O((2-δ)^n) time, refuting the strong exponential time hypothesis.
  • Max-4-Clique solvable in O(n^{4-δ}) time for some δ, resolving a long-standing opening problem of improving the best-known algorithm for Max-4-Clique.
They also prove similar results for other problems, including edit distance with gap penalties, and longest commen subsequence with wildcards (input strings are allowed to have wildcard characters that match any character). The proofs follow the usual approach of using a O(n^{2-ε})-time local alignment algorithm to produce faster algorithms for the other problems and those that Amir showed were very clean.

Amihood Amir, Timothy M. Chan, Moshe Lewenstein and Noa Lewenstein. On Hardness of Jumbled Indexing.

This work has a similar flavor to the Abboud, Williams, Weimann paper, using the 3SUM conjecture to develop lower bounds for string problems. Unlike that paper, this work gives lower bounds for data structures answering queries. The main problem considered is of jumbled, a.k.a. histogram indexing: given a string T with alphabet of size k, preprocess T to answer queries of the form "what substrings of T have character occurrence counts ", i.e. what substrings have the given histogram. For many pattern matching problems on strings, suffix trees/arrays are used to achieve linear-time queries.
For histogram indexing, naive approaches yield either O(1) preprocessing and O(n) query time (scanning the string using a window whose size is the sum of the entries in the histogram) or O(kn^2) preprocessing time and O(1) query time (precompute the histograms of all substrings and look up the answer). The contribution of Amir, Chan, Lewenstein, and Lewenstein is to prove that, assuming the 3SUM conjecture, these results are the best possible.
For a fixed alphabet of size k at least 3, they prove that achieving O(n^{2-2/k}) preprocessing and O(n^{1-1/k}) query time simultaneously is 3SUM-hard. For alphabets of non-constant size, they prove that achieving O(n^{2-ε}) preprocessing and O(n^{1-δ}) query time is 3SUM-hard. He also mentioned that unpublished work with Timothy Chan achieves a positive result in the form of a data structure that achieves O(n^{2 - 2/k + O(1)}) preprocessing and O(n^{1 - 1/k + O(1)}) query time simultaneously for fixed alphabets.
Moshe did a good job of mixing the problems, motivation, and technical details in a way that kept the talk interesting. Related to the reductions in this work, which use a reduction from convolution 3SUM (testing whether there exist two indices i, j, of the input array such that A[i] + A[j] = A[i+j]), he also recommended that people read Patrascu's paper that proves this special form of 3SUM is 3SUM-hard.

John Iacono and Özgür Özkan. Why Some Heaps Support Constant-Amortized-Time Decrease-Key Operations, and Others Do Not

John's hand-drawn slides caught the attention of some people from the very beginning, and combined with his deadpan delivery and promise to "keep it light" left the audience quiet chuckling through the talk. The talk started with a chat about heaps. All good heaps have O(log(n))-time insert and extract-min operations. Decrease-key is also an operation of heaps; some heaps are better at it. There are three families of heaps with three different running times for decrease-key:
  • Fibonacci and "improvements" that achieve O(1) time.
  • Pairing heaps that achieve O(2^{2^{sqrt{loglog(n)}}}) time.
  • Elmasry/sort heaps that achieve O(loglog(n)) time.
All decrease-key operations of these trees pair small trees into bigger trees. What distingushes them is how they do this:
  • Pairing heaps: repeatedly pair trees recursively. Repeat until one tree remains.
  • Elmasry/sort heaps: break roots into blocks of size log(n), chain all roots of a block together.
  • For Fibonacci heaps: pair heaps whose roots have the same number of children.
Note that the Fibonacci pair based on tree size, not key value, and use an augmentation of loglog(n) bits in each root. Seems like you augmented bits = fast? Fredman considered this in 1999, proving O(1) decrease-key implies Omega(loglog(n)) bits for some classes of heaps, namely pairing heaps. But lower bound requires assumption that every time a pair of roots have their values compared, their heaps are combined.
This contribution of Iacono and Özkan is a new lower bound without such the requirement that trees are combined when compared, but with a new restriction: must pay pointer model costs. This result yields lower bound of Omega(loglog(n)) for pairing and sort heaps, but says nothing about Fibonacci heaps because they cheat: they do not pay pointer model costs. A pointer model version would build a tree above the array, which would have height loglog(n) and give decrease-key a similar running time. The actual proof is adversarial, and considers a whole family of distinct Fibonacci heaps that must grow larger than the total number of possible heaps, a contradiction.

ICALP Day 2 (guest post by Andrew Winslow)

Andrew Winslow was one of the colleagues who kindly answered my call for guest bloggers from ICALP 2014. Here is guest post on the second conference day. Enjoy it! 

Invited Talk: Victor Kuncak

Victor gave a talk entitled "Verifying and Synthesizing Software with Recursive Functions" about work with his students, including Ruzica Piskac, Phillipe Suter, Eva Darulova, Ettienne Kneuss, and Andrej Spielmann. The work focused on a number of problems in the area of turning a specification into a program that meets it. He did a great job of breaking out four types of problems they're working on:
Run-time Compile-time
Have specification C and program P Assertion checking
(Check P meets C)
Prove correctness
(Prove P always meets C)
Have specification C Constraint programming
(Carry out execution according to C)
Program synthesis
(Create P meeting C)
Their work is done in Scala and implemented in the Leon system. Out of the four types of problems, Victor indicated that the first two (first row of the table) are easier. Of course, easier doesn't mean easy, and he briefly covered a few of the ideas (induction, arithmetic/algebraic reasoning, SAT solvers, integer linear programs) used to achieve good empirical performance -- a few seconds to prove correctness programs of 100 lines of code or more, and counterexamples in the case of incorrect code.
Constraint programming was next, and he gave the example of programming red-black tress via constraints: using as constraints the five properties a red-black tree must have. Insertion, for instance, would be encoded as a combination of the five properties, plus the fact that the new tree must have the elements of the old tree plus the inserted item. Although the performance was not great, he stressed that the applications of constraint programming are more along the lines of prototyping new data structures, test case generation, etc. One cool trick was using the fact that a solution is a counterexample of the negation of the constraints...reducing the problem to one in program verification.
Finally, he covered most extreme problem: writing code based on a specification. Here he had some really nice examples, generating code for converting dates and times, simple data structures like sorted lists, etc. in just a few seconds. Many ideas were used, but I was especially impressed by his mention of their work on floating point representations, synthesizing programs that avoided rounding errors.

Andreas Björklund and Thore Husfeldt, Shortest Two Disjoint Paths in Polynomial Time

Andreas gave the talk for this Best Paper award winner in Track A. The problem considered is a very simple one: given an undirected graph and two pairs of vertices (s1, t1) and (s2, t2), find a pair of disjoint paths whose total length is minimized. Several problems near to this one are NP-hard (e.g. minimizing the maximum of the two path lengths), so placing the problem into P is the major thrust of the paper -- the algorithms given run in O(n^11) and O(n^22) time for vertex- and edge-disjoint paths, respectively.
The approach of the algorithm is almost entirely matrix-based, focusing on solving a cycle cover problem on the input graph with self-loops added to each vertex...except that doesn't quite work. Andreas did a great job of leading us on a journey where obstacle after obstacle to the approach was eliminated. Many of the issues relate to the complexity of computing permanent...indeed one of the major technical contributions is proving that computing the permanent of a matrix over a certain type of ring is in P. In the end, the necessary permanent computation is replaced with a polynomial number of determinant computations, giving the large (but polynomial) running time.
In perhaps the most natural course of events, someone asked after the talk whether the technique might work for the shortest three disjoint paths. So natural was the question that a followup slide for the question was already prepared. The conclusion was "probably not", but Andreas posed a first problem to try in this direction: given an undireced graph, compute the parity of the number of cycle covers having c cycles, with c divisible by 4.

Mitsuru Kusumoto and Yuichi Yoshida, Testing Forest-Isomorphism in the Adjacency List Model

Mitsuru started by introducing the notion of property testing on graphs: computing whether the graph has a given property, answering correctly at least 2/3rds of the time while using only a small number of queries on the graph. In this setting, even the representation of a graph as an adjacency matrix or list matters, as checking whether a given edge is present takes too long with an adjacent list repesentation.
Kusumoto and Yoshida considered the problem of testing whether two forests are isomorphic. It was known that in general graphs this problem requires Omega(sqrt(n)) queries to the graphs, and only O(1) if the graphs have bounded degree. For forests, they achieve an algorithm that uses only O(polylog(n)) queries and prove Omega(sqrt(log(n)) are needed. The algorithm works by first performing a special type of partitioning of the input graph, then testing whether each subgraph is isomorphic. The partitioning is done in a way that ensures that if the two input graphs are significantly far away from being isomorphic, then some pair of their subgraphs in their partitions are too.

Rom Aschner, Matthew J. Katz, Bounded-Angle Spanning Tree: Modeling Networks with Angular Constraints

Rom gave a nicely visual talk about a problem in wireless network design in the plane. The input is a set of points, and the output is a (Euclidean) minimum spanning tree (MST) with an additional constraint: the smallest angle spanning all of the edges is at most some fixed constant b (a b-MST). Another way to think about this constraint is that the largest angle between any pair of adjacent incoming edges must be at least 2*π - b.
For b < π / 3, a b-MST may not even exist: three points at the vertices of an equilateral triangle require at least one of the points to have two incoming edges with interior angle π / 3. For b >= 8 π / 5, the b-MST is simply the minimum spanning tree, as the upper degree bound of 5 on every node in an MST ensures the smallest angle spanning all of the edges is at least 2 * π - 2 π / 5 = 8 π / 5. Finally, for b < 1/2, there exists a simple family of examples (n-1 points placed collinearly ε distance apart and a final point distance 1 from the shared line) that causes the b-MST to have total weight ε * (n - 1) + 1 and b-MST to have total weight 1 * (n - 1) = n - 1. So the non-trivial b are those in the interval [1/2, 8/5*pi).
For these, Aschner and Katz prove both NP-hardness and approximation results. They prove that finding the b-MST for b = π and b = 2 π / 3 are both NP-hard, using a reduction from Hamiltonian path in square and hexagonal grid graphs, respectively. They also achieve constant-factor approximation algorithms for b = π, 2 π / 3, π / 2. The approximation algorithms use a common sequence of steps:
  1. Compute the MST.
  2. Turn the MST into a Hamiltonian cycle via a tour around the boundary of the MST.
  3. Decompose the points into consecutive groups along the cycle.
  4. For each group, compute a set of edges connecting the points and minimizing total edge length.
  5. Connect adjacent groups using the shortest edge possible.
Step 4 requires a careful selection of group sizes, as larger groups give greater potential for shorter collections of edges, but for large enough b, actually connecting the points may not be possible. The resulting approximation ratios are 16 and lower, and can further be reduced by a neat pigeonhole argument regarding shofting the groupings along the tour.

Canal Tour and Conference Dinner

The day concluded with a tour through the canals of Copenhagen and dinner at the Odd Fellow Palace. The 2014 Gödel Prize was awarded to Ronald Fagin, Amnon Lotem, and Moni Naor. Moni gave a great historical and technical talk on the work for which they received the award, including the threshold algorithm. Ronald Fagin also spoke briefly about Michael Franklin, the "missing fourth author" of the work, and his role in facilitating the successful collaboration.

Wednesday, July 09, 2014

July 8, 2014 — ICALP Review (guest post by Clément Canonne)

Here is a first guest post from ICALP 2014 kindly written by Clément Canonne. Enjoy it! (This  post is also available in PDF. Any typo or typesetting issue with the HTML version is my responsibility.)

 

On DNF approximators for monotone Boolean functions (Eric Blais)

In this talk, Eric Blais presented joint work with Johan Håstad, Rocco Servedio and Li-Yang Tan, concerned with Boolean functions. More precisely, "the simplest representations of functions": DNF (Disjunctive Normal Form) formulas.
For a little bit of background, recall that a Boolean function f : {0,1}n →{0,1} defined on the hypercube [2] is a DNF if it can be written as


that is as an OR of AND’s. One can also see such functions as being eactly those taking value 1 on an "union of subcubes" (if One prefers. I will not argue with One).

A nice property of DNF formulas is that they are arguably amongst the simplest of all representations of Boolean functions; while formulas of depth ≥ 3 are a nightmare, DNFs have been extensively studied, and by now "Everything is known about them". Well, almost everything.

Indeed, amidst other facts, we have that

Theorem 1 (Folklore). Every Boolean function can be computed by a DNF of size 2n-1.

Theorem 2 (Lupanov ’61). This is tight (PARITYn needs that much).

Theorem 3 (Korshunov '81, Kuznetsov '83). A random Boolean function can be computed by DNFs of size Θ(2n∕log n) (and requires that much).

So... are we done yet? The mere presence of Eric in the auditorium was a clear hint that all was not settled. And as it turns out, if the picture is well understood for exact computation of Boolean functions by DNFs, what about approximate representation of a function? That is, what about the size required to approximate a Boolean function by a DNF, if one allows error ε (as a fraction of the inputs).

This leads to the notion of DNF approximator complexity; and here again some results – much more recent results:

Theorem 4 (Blais–Tan ’13). Every Boolean function can be approximated by a DNF of size O(2n∕log n). Furthermore, our all friend PARITYn only needs DNF size O(2(1-2ε)n).

That’s way better than 2n-1. So, again – are we done here? And, again... not quite. This brings us to the main point of the paper, namely: what about monotone functions? Can they be computed more efficiently? Approximated more efficiently? (Recall that a Boolean function f is monotone if x ≼ y implies f(x) ≤ f(y), where ≼ is the coordinate-wise partial order on bit-strings.)
As a starter: no.

Theorem 5 (Folklore) Every monotone Boolean function can be computed by a DNF of size O(2n∕n1/2) (using subcubes rooted on each min-term); and again, this is tight for PARITY.
Furthermore, and quite intuitively, using negations does not buy you anything to compute a monotone function (and why should it, indeed?):

Theorem 6 (Quine ’54). To compute monotone Boolean functions, monotone DNFs are the best amongst DNFs.
Not surprising, I suppose? Well... it’s a whole new game when one (one, again!) asks only for approximations; and that’s the gist of the paper presented here. First of all, drastic savings in the size of the formulas!

Theorem 7 (Blais–Hastad–Servedio–Tan ’14). Every monotone Boolean function can be approximated by a DNF of size O(2n∕2Ω(n1/2) ).
Eric gave a high-level view of the proof: again, it works by considering the subcubes rooted on each min-term, but in two steps:
  • Regularity lemma: the world would be much simpler if all subcubes were rooted on the same level of the hypercube; so first, reduce it to this case (writing f = f1 + .....+ fk, each fi has this property)
  • then, approximate independently each fi, using a probabilistic argument (via random sampling), to prove there exists a good approximator for all fi’s, and then stitching them together.
And they also show it is tight: this time with the majority function MAJn. The proof goes by a counting argument and concentration of measure on the hypercube (every or almost every input is on the middle "belt" of the hypercube; but each subcube thus has to be rooted there, and each cannot cover too much... so many are needed)

So, approximation does buy us a lot. But clearly, using negations shouldn’t, should it? Why would allowing non-monotone DNF’s to approximate monotone functions ever help? (Hint: it does.) (Yep.)

Theorem 8 (Blais–Hastad–Servedio–Tan ’14). For every n, there exists εn and f : {0,1}6n →{0,1} such that
  • f can be εn-approximated by DNFs of size O(n);
  • any monotone DNF εn-approximating f must have size Ω(n2).
(Take that, intuition!)

The upshot: exact computation and approximate computation have intrinsically very different properties!

Eric then concluded with an open question: namely, how to improve/better understand the gap between approximating functions with monotone DNF vs. approximating them with general DNF’s (the currently known gap in the size being quite huge – almost exponential). Additionally, how to get a separation as in the mind-boggling theorem above, but changing the quantifiers – that is, for a constant ε independent of n?

Also, can someone fix my intuition? I think it’s broken.

Day 1 of ICALP 2014 (guest post by Andrew Winslow)


Andrew Winslow was one of the colleagues who kindly answered my call for guest bloggers from ICALP 2014. Here is guest post on the first conference day. Enjoy it!

Conference Welcome

Local organizing committee chair Thore Husfeldt kicked off ICALP by mentioning a few stats about IT University of Copenhagen (ITU), where the conference is located:
  • Started in 1999(!)
  • 2000 students.
  • 10 theory CS faculty (mostly working in "track B" areas in ICALP parlance)
  • 1 building.
He also noted that ITU is separate from two other universities in Copenhagen hosting TCS conferences this month: Copenhagen University (hosting SEA last week) and Technical University of Denmark (hosting SWAT last week).

Invited Talk: Claire Matthieu

Claire presented work entitled "Homophily and the Emergence of a Glass Ceiling Effect in Social Networks", joint work with Chen Avin, Zvi Lotker, Barbara Keller, David Peleg, and Yvonne-Ann Pignolet. She started the talk by asking why there's only one female Danish scientist (Inge Lehmann) listed on the Wikipedia page for famous Danish scientists out of about 60 entries -- why so rare? Looking at the DBLP database data as a social network with co-authorship edges gives similarly bleak results: the induced subgraphs of men and women in the top 1000 nodes shows a large, well-connected graph and a disconnected, sparse graph, respectively.
Claire and co-authors developed a formal model of growing co-authorship social networks, including assumptions about what a "glass ceiling effect" means for these networks, and proved that three properties are minimal sufficient set of properties to imply the effect. These properties are:
  • Preferential attachment: a "rich get richer" effect where incoming students choose to work an advisor with probability proportional to the number of co-authors the advisor has.
  • Minority: an incoming student is female with probability r < 0.5.
  • Homophily: students stick with an advisor of the same gender with probability 1, otherwise find a new advisor with probability p < 1.
Incoming students are iteratively added to the graph to form growing random graphs, repeatedly chosing new advisors according to homophily and preferential attachment. The glass ceiling effect is defined by looking at all nodes with degree k, where k = k(n) chosen so that the number of such nodes goes to infinity with n, and examining the ratio of women to men in this set. If the expected ratio of female over male nodes goes to 0 as the number of nodes goes to infinity, then the system is said to have a glass ceiling effect for the parameters r, p chosen. They are able to prove that the above three properties imply the class ceiling effect, and removing any one of them does not. The model was also verified experimentally using the DBLP database.
As a final twist, Claire considered modifying the model to account to capture the "men are intrinsically better" hypothesis, giving new male students two advisors rather than one. This was proven not to cause the glass ceiling effect. I was really struck by the meta-ness of the talk and how natural the problem seems. In the formal study of (social) networks, isn't the most natural network of study the one that directly impacts our ability to formally study networks?

György Dósa and Jiří Sgall. Optimal Analysis of Best Fit Bin Packing.

Dösa and Sgall complete the analysis of two classic (online) algorithms for the standard bin packing problem: given a collection of items specified by numbers in [0, 1], assign these items to the fewest number of bins where each bin has an item total of at most 1. The two algorithms, first-fit and best-fit, process the items one-by-one, placing each item permanently into an existing bin if possible, otherwise placing the item into a newly created bin. First-fit and best-fit differ only in the criteria by which they place item into existing bins, using the oldest bin (first-fit) or bin that would have the least remaining room after the item is placed into it (best-fit). It was previously shown that for every instance with optimal solution using OPT bins, first-fit and best-fit use ceil(1.7*OPT) bins and there exist instances where both use 1.7*OPT bins. Dösa and Sgall improved the upper bound to 1.7*OPT bins, dropping the ceiling and completing the analysis of the competitive ratio of these algorithms. Jiří commented that he felt that such classic algorithms deserved a complete analysis and I agree; the talk had an air of finality, having put the 40-year-old gap to rest.

Hossein Esfandiari, Mohammadtaghi Hajiaghayi, Reza Khani, Vahid Liaghat, Hamid Mahini and Harald Räcke. Online Stochastic Reordering Buffer Scheduling.

Vahid Liaghat gave this talk on a scheduling problem where items have types from a fixed universe and enter into a buffer of sized size k. One can then process all items of a type, flushing them from a buffer and causing new items to take their place. The cost of processing a sequence of items is either the number of processing steps used (block operation model) or number of times the type processed changes (standard model). It was known that a O(loglog(k))-competitive algorithm is possible for the adversarial/worst-case. This work studies the sitation where the input is not adversarial, but is randomly sampled from some set. Esfandiari et al. achieve an average-case O(1)-competitive algorithm for both the block operation and standard models for the case where an adversary designs an input that is then randomly permuted. The analysis uses several lemmas regarding the optimal solutions of various buffer sizes as well as threshold algorithms: algorithms that process each type as soon as a specific number of occurrences of a type is exceeded.

Erik D. Demaine, Yamming Huang, Chung-Shou Liao and Kunihiko Sadakane. Canadians Should Travel Randomly.

This paper attracted a few comments related to the curious title. The title comes from the "canandian traveler problem" of study, introduced by Papadimitriou and Yannakakis in 1991. The problem is a shortest path problem with partial information : some of the edges of the path are blocked (by intense canadian snowfall) and cannot be traversed, but this is only known after a traveler has visited an endpoint of the blocked edge. The parameterized version of the problem has k blocked edges, and the goal is find an algorithm with good competitive ratio compared to the shortest path that avoids all blocked edges. For deterministic algorithms, tight 2k+1 upper and lower bounds on the ratio are known, while for randomized algorithms, only a lower bound of k+1 (and implied upper bound of 2k+1) were known.
Demaine, Huang, Liao, and Sadakane improve the best-known randomized algorithm in two ways, giving a polynomial-time algorithm that achieves a ratio of (2 - o(1))k + 1, and a pseudo-polynomial-time algorithm (parameterized by k, the number of blocked edges) that achieves a (1 + sqrt(2)/2)k + 1 ratio. The ideas for both algorithms are to repeatedly compute a collection of nearly shortest paths, traverse them each until blocked, and repeating the process with the new information if none of the paths lead to reaching the destination. A careful selection of the number of paths used then yields the bound. The main difference between the two algorithms is exactly what short paths are used; the pseudo-polynomial-time algorithm uses a list of shortest paths, whereas the polynomial-time algorithm only finds paths within a constant factor of the shortest.

Dmitry Gavinsky and Shachar Lovett. En Route to the log-rank Conjecture: New Reductions and Equivalent Formulations.

Dmitry started by reviewing two-party communication problems, where the goal is to compute a combined Boolean function f of inputs x and y held by the two parties while passing the fewest number of bits between the parties. One can consider the matrix defined by the two inputs, with an entry for every (x, y) pair, called the Boolean matrix. The rank of this matrix (and Boolean function f) is the minimum number of rectangles found in some partition of the matrix into rectangular blocks of entries with the same value. It is known that if a c-bit communication protocol is possible, then rank of the matrix is at most 2^c, i.e. the communication complexity (in bits) is bounded from below by log(rank(f)). The log-rank conjecture is that polylog(rank(f)) is an upper bound on the communication complexity of computing f.
Part of this work by Gavinsky and Lovett is to provide a number of equivalent versions of the problem that may provide a way to make progress on the log-rank conjecture. For instance, they show that proving the log-rank conjecture for the randomized communication complexity on low-rank functions implies the log-rank conjecture for the deterministic version, as does proving the conjecture for the information cost of f, the minimum amount of information given by each party the other party in the worst-case.

EATCS General Assembly

The day ended with the EATCS general assembly led by Luca Aceto and appearances by the many people involved in ICALP and EATCS. ICALP 2015 will be in Kyoto, Japan, and some information about travel, accomodations, and Kyoto were presented by Kazuo Iwama, and ICALP 2016 will be in Rome, Italy. Some statistics about ICALP 2014 were given by Elias Koutsoupias -- the number of submissions is up from previous years, acceptance rate is about steady at 30%, and the vast bulk of the submissions were to to Track A.
As one final thought from the day, the organization of the conference so far has been top-notch. Many of the talks are online (including Claire Matthieu's), the sessions start and end on time, sound and projection issues are non-existent, and the food/drink is tasty (and rotates every break...).