1st - 5th AUGUST 2014

Brisbane Convention & Exhibition Centre

  • Mini-Conferences
    August 1
  • Presentations
    August 2-3
  • Sprints
    August 4-5

Record linkage: Join for real life

In an ideal world, you can join on candidate keys, names are never misspelt, people never move house, and I have a pony. I don’t have a pony. But we do have record linkage.

Record linkage (also known as data linkage or fuzzy record matching) is a naive Bayes algorithm which matches data about individuals across databases or within a single database, by constructing the probabilities that two records apply to the same individual or to different individuals.

In this talk I will discuss the techniques of record linkage in Python, the usefulness and the limitations of linkage, and the effect that this technique is having on healthcare research in particular.

Healthcare/epidemiology studies often require data from more than one source, and individuals frequently have multiple interactions with a data set without a unique identifier. The outcomes of these studies are only as good as the record linkage which underlies them, so the ways in which record linkage is done can have a direct impact on our ability to understand, prevent and treat serious medical conditions.

Rhydwyn Mcguire

Rhydwyn is a researcher, data geek, and public health biostatistician, currently working on real-time disease monitoring and automated coding from free-text medical notes.

Mostly due to being bad at avoiding the opportunity to investigate interesting things, Rhydwyn's career has involved chasing data in all kinds of directions. This has led to pursuing majors in statistics, mathematics and computer science, and a masters in biostatistics.