I’m happy to formally announce a major update to the developmental version of the pdfsearch R package. In brief, this version includes support for splitting the PDF by sentences instead of by lines of text. Secondly, initial testing of splitting PDFs that are aligned in multiple columns has been promising. This functionality attempts to align the multiple columns into a single column in which the keyword searching peformed by pdfsearch can be stronger with the search being done in context.
For the past few years (quickly approaching a decade), a colleague and I started scraping information about college football coaches to explore the question, which coach should the University of Minnesota hire. This project started, as most interesting project do, at happy hour one day debating who should be hired to replace Tim Brewster, the latest college football coaching disaster at the University of Minnesota. We decided to collect some data to explore the idea in more detail.
The simglm package has an update on CRAN bumping the version up to 0.6.0. This update has added the ability to simulate count data (poisson) and also has fixed (I think) the Shiny app that comes with the package. As I have not posted about this package since the first CRAN release (v 0.5.0), I plan to give an overview of all that the package offers in addition to the new additions.
This is a quick note looking for any further feedback on the simglm package prior to CRAN submission later this week. The goal is to submit Thursday or Friday this week. The last few documentation finishing touches are happening now working toward a version 0.5.0 release on CRAN. For those who have not seen this package yet, the aim is to simulate regression models (single level and multilevel models) as well as employ empirical power analyses based on Monte Carlo simulation.
Markdown (and Rmarkdown) are great ways to quickly develop material without worrying about the formatting. The documents can then be compiled using the knitr or rmarkdown packages to output formats such as HTML, latex, or even word. The main drawback of this approach is that formatting of documents is limited to italics, bold, or strikethrough. Markdown does have support for inline HTML, therefore you can add your own formatting inline using CSS or other HTML attributes, however this moves away from the quick markdown flavor.
I’m happy to introduce an add-on package, pdfsearch, that adds the ability to do keyword searches on pdf files. This add-on package uses the excellent pdftools package from the ropensci project to read in pdf files and perform keyword searches based character strings of interest. Installation The package is currently only hosted on github and can be installed with the devtools library. devtools::install_github('lebebr01/pdfsearch') Basic Example Doing a simple keyword search on a single pdf file uses the keyword_search function.
I have a simulation package that allows for the simulation of regression models including nested data structures. You can see the package on github here: simReg. Over the weekend I updated the package to allow for the simulation of unbalanced designs. I’m hoping to put together a new vigenette soon highlighting the functionality. I am working on a simulation that uses the unbalanced functionality and while simulating longitudinal data I’ve found the function is much slower than the cross sectional counterparts (and balanced designs).
I recently had an occasion while working on a three variable interaction plot for a paper where I wanted to remove the leading 0’s in the x-axis text labels using ggplot2. This was primarily due to some space concerns I had for the x-axis labels. Unfortunately, I did not find an obvious way to do this in my first go around. After tickering a bit, I’ve found a workaround. The process is walked through below.
I’d like to introduce a package that simulates regression models. This includes both single level and multilevel (i.e. hierarchical or linear mixed) models up to two levels of nesting. The package produces a unified framework to simulate all types of continuous regression models. In the future, I’d like to add the ability to simulate generalized linear models. This package is an extension of the functions I used to simulate data for my dissertation.
I was emailed by a friend that was looking into their google location data and had asked if I had ever used a json file before in R. I said I had not, but I knew there were packages to do such things. The things I sent were things he had already tried, so what did I decide to do? I went ahead and downloaded my own google location data.