I am interested in distributed systems in general.

I am involved in the ESRC-funded Digitising Scotland project, which aims to construct a linked genealogy of Scottish historical records, with Chris Dibben, Lee Williamson and Zhiqiang Feng at Edinburgh, and Alan Dearle, Özgür Akgün and Tom Dalton in Computer Science at St Andrews. This work also includes Eilidh Garrett and Alice Reid at Cambridge, and Peter Christen at ANU.

I previously led a work package on linkage methodology within the ESRC-funded Administrative Data Research Centre - Scotland (funding), with Alan Dearle, Özgür Akgün, Peter Christen and Alasdair Gray at Heriot-Watt.

I am supervisor for Tom Dalton, who is doing his PhD on handling uncertainty in data linkage, with a focus on using synthetic population-scale data for evaluating population linkage approaches.

Programme Committees

Possible PhD Projects

Analysis and Linking of Large-Scale Genealogical Datasets

This project investigates methods for analysing and linking large sets of individual genealogical records. The core problem is to take a set of digitised records, and from this create a set of inter-linked pedigrees for the population. This specification may be usefully refined with the introduction of provenance and confidence, so that rather than producing a single set of pedigrees, a set of potential pedigrees is produced along with evidence for the relationships within them. From such a representation it is possible to project out various specific sets by defining appropriate criteria.

We thus need a computational process that will not only give results such as X is the mother of Y, but also that X may be the mother of Y based on information P,Q,R, and that Z may be the mother of Y based on S,T,V. This will provide a richer information source for future research, and requires new ways of approaching the problem. In the past, algorithms have been run on data sets to produce definitive pedigrees. We propose to attack this at a meta level with a reasoning engine that will not only produce results but also reasons for those results.

For flexibility and long-term usefulness, we do not envisage a one-off process in which a set of records is fed into an algorithm and a set of pedigrees is output. Instead, a continuous process will accept an indefinite stream of records to feed into and refine an established knowledge base. Similarly, the set of rules that govern the relationships between records may be refined and evolved over time. This evolutionary approach yields a highly flexible knowledge and reasoning engine; however, allowing rules and inferred relationships to change carries the danger of information being lost during the process. To prevent this, we propose a non-destructive append-only approach to the storage of data and meta-data.

It is hoped to be able to evaluate the techniques developed using the full set of birth/death/marriage records from Scotland 1850 to the present. A preliminary phase of the project is now under way. [With Alan Dearle]

Towards Pervasive Personal Data

This project will investigate techniques for enabling pervasive file data across all the storage resources available to an individual or an organization. These may encompass personal computers and mobile devices, machines available within a work environment, and commercial online services. The envisioned infrastructure observes file changes occurring in any particular storage location, and automatically propagates those changes to all other appropriate locations, similarly to existing services such as Windows Live Mesh, DropBox and SugarSync, but:

  • does not rely on external services
  • observes and exploits variations in machine and network capabilities
  • exploits disparate storage facilities, including those provided by machines not exclusively under the user’s control
  • exploits disparate data transfer mechanisms, including physical movement of passive devices
  • supports high-level user policies governing data placement and synchronization strategy

The main challenges to implementing such infrastructure are in detecting and exploiting patterns in resource availability; planning, executing and monitoring good routes and schedules for data propagation; and supporting users in visualizing current system state and its relation to the goals of their high-level policies.

Previous PhD Students

Back to top

Last Published: 12 Aug 2021.