In every data project, there should be a check that the data actually looks like what you expect it to look like.
This can be as simple as stopifnot(all(data$values > 0)), but as with everything “simple”, you typically want to have some additional features, such as cleaner error messages, rules separated from your R script (eg in a yaml file), result visualization, and last but least, a library that does this as fast as possible.
        
      
     
  
    
    
    
    
      
      
        
          Recently I was faced with a file compressed in NASDAQ’s ITCH-protocol, as I wasn’t able to find an R-package that parses and loads the file to R for me, I spent (probably) way to much time to write one, so here it is.
        
      
     
  
    
    
    
    
      
      
        
          I recently had the opportunity to listen to some great minds in the area of high-frequency data and trading.
While I won’t go into the details about what has been said, I wanted to illustrate the importance of proper out-of-sample testing and proper variable lags in potential trade algorithms or arbitrage models that has been brought up.
        
      
     
  
    
    
    
    
      
      
        
          The following entry explains a basic principle of finance, the so-called efficient frontier and thus serves as a gentle introduction into one area of finance: “portfolio theory” using R.
A second part will then concentrate on the Capital-Asset-Pricing-Method (CAPM) and its assumptions, implications and drawbacks.
        
      
     
  
    
    
    
    
      
      
        
          This is a direct (though minor) answer to Daniel’s blogpost http://daniellakens.blogspot.de/2016/01/power-analysis-for-default-bayesian-t.html, which I found very interesting, as I have been trying to get my head around Bayesian statistics for quite a while now.
        
      
     
  
    
    
    
    
      
      
        
          You could say that the following post is an answer/comment/addition to Quintuitive, though I would consider it as a small introduction to parallel computing with snowfall using the thoughts of Quintuitive as an example.
        
      
     
  
    
    
    
    
      
      
        
          Facing a simple, yet frustrating formula like this
\[xe^{ax}=b\]
and the task to solve it for x left me googling around for hours until I found salvation in Wolfram Alpha, Wikipedia, and a nice blogpost with R-syntax to solve a similar equation.
        
      
     
  
    
    
    
    
      
      
        
          When dealing with large datasets that potentially exceed the memory of your machine it is nice to have another possibility such as your own server with an SQL/PostgreSQL database on it, where you can query the data in smaller digestible chunks.
        
      
     
  
    
    
    
    
      
      
        
          Warning
After revisiting this blog 5 years later, I was not able to reproduce the code fully, see the addendum for a post-mortem.
Introduction
Recently I found a good introduction to the Schelling-Segregation Model and to Agent Based Modelling (ABM) for Python (Binpress Article by Adil).
        
      
     
  
    
    
    
    
      
      
        
          For a project I recently faced the issue of getting a database of all aviation incidents. As I really wanted to try Hadley’s new rvest-package, I thought I will give it a try and share the code with you.