R | Data Shenanigans

Introducing dataverifyr: A Lightweight, Flexible, and Fast Data Validation Package that Can Handle All Sizes of Data

In every data project, there should be a check that the data actually looks like what you expect it to look like. This can be as simple as stopifnot(all(data$values > 0)), but as with everything “simple”, you typically want to have some additional features, such as cleaner error messages, rules separated from your R script (eg in a yaml file), result visualization, and last but least, a library that does this as fast as possible.

Introducing RITCH: Parsing ITCH Files in R (Finance & Market Microstructure)

Recently I was faced with a file compressed in NASDAQ’s ITCH-protocol, as I wasn’t able to find an R-package that parses and loads the file to R for me, I spent (probably) way to much time to write one, so here it is.

The Importance of Out-of-Sample Tests and Lags in Forecasts and Trading Algorithms

I recently had the opportunity to listen to some great minds in the area of high-frequency data and trading. While I won’t go into the details about what has been said, I wanted to illustrate the importance of proper out-of-sample testing and proper variable lags in potential trade algorithms or arbitrage models that has been brought up.

A Gentle Introduction to Finance using R: Efficient Frontier and CAPM - Part 1

The following entry explains a basic principle of finance, the so-called efficient frontier and thus serves as a gentle introduction into one area of finance: “portfolio theory” using R. A second part will then concentrate on the Capital-Asset-Pricing-Method (CAPM) and its assumptions, implications and drawbacks.

Speeding "Bayesian Power Analysis t-test" up with Snowfall

This is a direct (though minor) answer to Daniel’s blogpost http://daniellakens.blogspot.de/2016/01/power-analysis-for-default-bayesian-t.html, which I found very interesting, as I have been trying to get my head around Bayesian statistics for quite a while now.

Simulating backtests of stock returns using Monte-Carlo and snowfall in parallel

You could say that the following post is an answer/comment/addition to Quintuitive, though I would consider it as a small introduction to parallel computing with snowfall using the thoughts of Quintuitive as an example.

Getting that X with the Glog function and Lambert's W

Facing a simple, yet frustrating formula like this \[xe^{ax}=b\] and the task to solve it for x left me googling around for hours until I found salvation in Wolfram Alpha, Wikipedia, and a nice blogpost with R-syntax to solve a similar equation.

Getting started with PostgreSQL in R

When dealing with large datasets that potentially exceed the memory of your machine it is nice to have another possibility such as your own server with an SQL/PostgreSQL database on it, where you can query the data in smaller digestible chunks.

Agent Based Modelling with data.table OR how to model urban migration with R

Warning After revisiting this blog 5 years later, I was not able to reproduce the code fully, see the addendum for a post-mortem. Introduction Recently I found a good introduction to the Schelling-Segregation Model and to Agent Based Modelling (ABM) for Python (Binpress Article by Adil).

Using rvest and dplyr to scrape data and look at aviation incidents

For a project I recently faced the issue of getting a database of all aviation incidents. As I really wanted to try Hadley’s new rvest-package, I thought I will give it a try and share the code with you.