Jared S. Murray

Brendan McVeigh

BayesianRecordLinkage.jl: Julia implementation of unsupervised methods for one-to-one record linkage. Both an EM algorithm based method and a MCMC algorithm employing a standard Fellegi-Sunter based approach are included. The module also includes methods for post-hoc blocking and a penalized likelihood based point estimate. See McVeigh, Spahn, and Murray (2019) for details.

AssignmentSolver.jl: Julia implementation of algorithms for solving linear sum assignment problems. This includes a version of the well known Hungarian algorithm as well as Auction algorithms (Bertsekas 1992).

Neil Spencer

cindRella: cindRella contains the code necessary to build our R package for evaluating footwear evidence (see Spencer and Murray, 2020). It also contains synthetic data to demonstrate its use.

Software for inferring heterogeneous treatment effects using Bayesian Causal Forests; see Hahn, Murray and Carvalho (2017)

The stable version is available on CRAN. Note that versions prior to 1.2 had a bug in the interface and should be updated. See https://github.com/jaredsmurray/bcf for the latest updates

MixedDataImpute is an R package for imputing missing multivariate continuous and categorical data, using the Bayesian nonparametric hierarchical model developed in Murray and Reiter (2016)

The latest stable version is available from CRAN (“install.packages(MixedDataImpute)”).

bfa is an R package for fitting Bayesian factor models (Gaussian, probit, mixed and semiparametric copula models) under a range of priors for the factor loadings. These include continuous and spike-and-slab shrinkage priors. MCMC is performed in compiled code, so model fitting is typically quite fast.

The latest stable version is available from CRAN (“install.packages(‘bfa’)”).

Here you can find the MPTuner software, an implementation of adaptive median polish algorithms for anomaly detection in sensor networks. The algorithms were developed by Prof. Ernst Linder at the University of New Hampshire’s Department of Mathematics and Statistics as part of the 2007 SAMSI program on Environmental Sensor Networks. Dr. Zoe Cardon at Woods Hole provided us with test data and insight into the ecological processes involved. I wrote the software during my senior year under the mentorship of Prof. Linder. For background on the technique you may refer to this talk by Prof. Linder.
The software is available as the original Python source or as a Windows executable. The scripts should run anywhere that Python and wxPython do. The binary is for Windows platforms only and has been tested in XP, Vista and Windows 7.

Installation:

Windows binary:

From source:

  • MPTuner is a Python application. It has several dependencies which are listed below. It has been tested on these versions, but it should work on more recent versions as well.

Running:

  • (optional) Edit the config.txt file as desired.
  • Run “gui_app.exe” if you are using the Windows binary, or execute the “gui_app.py” script. At the command line you can specify a datafile to load with the “-d” flag (e.g.: python gui_app.py -d “datafile.csv”).
  • Load the data (File > Load data).
  • (optional) Assign names to the factors identified in the data file (Data > Show data import settings). You can save the factor definitions file once finished.
  • (optional) Select factors to group the data by, e.g, treatments (Data > Show group settings).
  • The “Group” dropdown list will be populated with the nonempty groups. Select one, adjust the tuning parameters and push “Recalculate”. After running the polish for the various groups you can save the cleaned data (File > Save data).

Known Issues:

  • I wrote MPTuner as an undergraduate project. I didn’t know any Python when I started; I was learning as I went. It shows in the code (although it’s fully functional!).
  • MPTuner uses the EnhancedStatusBar class graciously provided by Andrea Gavana. This provides the colored square when you click on a line in the plot window. On some platforms the colored square doesn’t appear.
  • On (re)calcuating the median polish, you may see a DeprecationWarning indicating that scipy.stats.median is deprecated in favor of numpy.median. At present this doesn’t impact the functioning of MPTuner. This is an issue in the nanmedian function in SciPy using its own median function when it should be using NumPy’s. My understanding is that the change has been made in newer versions of SciPy.

The data included here are derived from work supported by the National Science Foundation under Grant No. 0415938. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.