I am interested in distributed systems in general, and distributed storage, peer-to-peer systems and middleware in particular.

I am involved in the ESRC-funded Digitising Scotland project, which aims to construct a linked genealogy of Scottish historical records, with Chris Dibben, Lee Williamson, Zhiqiang Feng and Zengyi Huang at Edinburgh, and Alan Dearle, Özgür Akgün and Tom Dalton in Computer Science at St Andrews. So far we have focused on automatic classification of certain fields within the records (cause of death and occupation); now we are starting to experiment with various probabilistic linkage approaches. This work also includes Eilidh Garrett and Alice Reid at Cambridge, and Peter Christen at ANU.

I also lead a work package on linkage methodology within the ESRC-funded Administrative Data Research Centre - Scotland (funding), with Alan Dearle, Özgür Akgün, Peter Christen and Alasdair Gray at Heriot-Watt.

I am supervisor for Tom Dalton, who is doing his PhD on handling uncertainty in data linkage, and second supervisor for Simone Conte, who is investigating user models for managing distributed data.

Previous PhD Students

  • Masih Hajiarab Derkani (2014): did his PhD work on Adaptive Dissemination of Network State Knowledge in Structured Peer-to-Peer Networks. He has published his Trombone software, which is an adaptive P2P overlay, and Shabdiz, which is a very light-weight Java tool that monitors a set of machines and ensures that some given application remains running on them.
  • Markus Tauber (2010): applied autonomic management to distributed storage systems. He looked at autonomic control of maintenance scheduling in Chord, and of replica retrieval concurrency in a simple distributed block storage system. The work is reported in papers at DANMS 2011 and Self-Adaptive Networking 2010 and in his thesis: Autonomic Management in a Distributed Storage System.
  • Aled Sage (2003): addressed the problem of how to configure a software system with a large number of tuning parameters. He developed a tool to automatically run performance tests using various parameter values, so that the best combination of parameter values could be selected. Clearly exhaustive search is impractical due to the combinatorial explosion in the number of possible combinations; this problem is exacerbated by the fact that in a non-trivial system it may take a relatively long time to conduct each test – in the case study of an industrial mail server, each test took 30 minutes. He used Taguchi’s Design of Experiments approach to select a very small sub-set of combinations from which reasonable conclusions could still be drawn. The work is reported in a paper at CDSA 2001 and in his thesis: Observation-Driven Configuration of Complex Software Systems.

Back to top

Last Published: 11 Jun 2018.