Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Latest commit

 

History

History
History
230 lines (188 loc) · 12.8 KB

File metadata and controls

230 lines (188 loc) · 12.8 KB
Copy raw file
Download raw file
Open symbols panel
Edit and raw actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
title: "geom_step"
author: "Win-Vector LLC"
date: "June 3, 2016"
output:
md_document:
variant: markdown_github
---
```{r setup, include=FALSE}
library('ggplot2')
library('dplyr')
library('tidyr')
# create a small example
set.seed(32535)
quotes <- data.frame(
quoteTime=strptime(c(
'2016-01-04 09:14:00','2016-01-04 11:45:17', '2016-01-04 15:25:00',
'2016-01-05 10:12:13',
'2016-01-06 09:02:00','2016-01-06 15:10:00', '2016-01-06 15:27:00'
),'%Y-%m-%d %H:%M:%S'),
stringsAsFactors=FALSE
)
quotes$date <- as.Date(quotes$quoteTime)
quotes$askPrice <- round(10*exp(0.1*(seq_len(nrow(quotes)) + cumsum(rnorm(nrow(quotes))))),
digits=2)
quotes$bidPrice <- round(quotes$askPrice - (0.1+runif(nrow(quotes))),
digits=2)
trades <- data.frame(tradeTime=quotes$quoteTime,
date=as.Date(quotes$quoteTime),
tradePrice=ifelse(runif(nrow(quotes))>0.5,quotes$askPrice,quotes$bidPrice),
stringsAsFactors=FALSE)
trades <- trades[runif(nrow(trades))>0.5,]
trades$quantity <- 100*(1+sample.int(5,nrow(trades),replace=TRUE))
breaks <- (floor(min(quotes$bidPrice,trades$tradePrice))-1):(ceiling(max(quotes$askPrice,trades$tradePrice))+1)
#' Find each value known at or before a given time.
#'
#' Useful for filling in irregular time observations.
#'
#' @param startValue numeric scalar value to use until we see data.
#' @param dataTimes vector non-decreasing times we have data.
#' @param dataValues vector same length as dataTimes, values known at times.
#' @param measurementTimes vector non-decreasing times we want measurement for.
#' @return vector of values known at measurementTimes
lastKnownValue <- function(startValue,dataTimes,dataValues,measurementTimes) {
measurementValues <- numeric(length(measurementTimes))
lastValue <- startValue
nextI <- 1
for(j in seq_len(length(measurementTimes))) {
while((nextI<=length(dataTimes))&&(dataTimes[[nextI]]<=measurementTimes[[j]])) {
lastValue <- dataValues[[nextI]]
nextI <- nextI + 1
}
measurementValues[[j]] <- lastValue
}
measurementValues
}
# build a structure showing market open hours
openTime <- '09:00:00'
closeTime <- '15:30:00'
dates <- sort(unique(quotes$date))
rbind(
data.frame(
time=strptime(paste(dates,openTime),'%Y-%m-%d %H:%M:%S'),
what='open',
stringsAsFactors=FALSE),
data.frame(
time=strptime(paste(dates,closeTime),'%Y-%m-%d %H:%M:%S'),
what='close',
stringsAsFactors=FALSE)) %>% arrange(time) -> openClose
openClose$date <- as.Date(openClose$time)
openClose$askPrice <-
lastKnownValue(NA,quotes$quoteTime,quotes$askPrice,openClose$time)
openClose$bidPrice <-
lastKnownValue(NA,quotes$quoteTime,quotes$bidPrice,openClose$time)
openClose %>% select(date,time,what,askPrice,bidPrice) -> openClose
```
[`geom_step`](http://docs.ggplot2.org/current/geom_path.html) is an interesting geom supplied by the [R](https://cran.r-project.orgli) package [ggplot2](http://ggplot2.org). It is an appropriate rendering option for financial market data and we will show how and why to use it in this article.
Let's take a simple example of plotting market data. In this case we are plotting
the "ask price" (the publicly published price an item is available for purchase
at a given time), the "bid price" (the publicly published price an item can be sold for
at a given time), and "trades" (past purchases and sales).
Most markets maintain these "quoted" prices as an order book and the public ask price is always
greater than the public bid price (else we would have a "crossed market"). We can also track recent
transactions or trades. Here is some example (made-up) data.
```{r data}
print(quotes)
print(trades)
```
Notice each revision of the book (notification of a bid price, ask price, or both)
happens at a specific time. Ask and bid prices are good until they are revised or withdrawn.
There is some issue as to what is the "price" of a financial instrument (say in this case a stock).
Money only changes hands on trades- so past quotes that were never "hit" or traded against in some sense never happened (in fact this is becoming a problem called "flashing"). So market participants can somewhat manipulate bids and asks as long as they don't cross. Asks and bids represent risk or a one-sided opinion on price but can not be trusted (especially when the "bid ask gap" is very large).
Trades cost fees and transfer money, so they are evidence of two parties agreeing on price for a moment. But all trades you know about are in the past. Just because somebody purchased some shares of IBM in the past for $120 a share doesn't mean you can do the same. You could only make such a purchase if there is an appropriate ask price in the market (or you place your own limit order forming a bid that somebody else hits).
What I am trying to say is the classic "ticker tape pattern" graph shown below drawing only trades and connecting them with sloping lines is not appropriate for plotting markets (especially when plotting high frequency or in-day data).
```{r plottrades}
ggplot(data=trades,aes(x=tradeTime,y=tradePrice)) +
geom_line() + geom_point()
```
There is a lot wrong with such graphs.
* We have plotted only past trades, so we have no idea what _we_ would have had to pay to buy stock or gotten to sell stock at any time.
* The sloped segments "leak information" from the future as right after the trade the line slope tells you if the next trade in the future is going to be at a higher or lower price than the trade at hand. It is important in graphing financial instruments to have graph of where at each time in the graph we are plotting only things that are known by that time. This is also why we should not use standard smoothing curves such as [`geom_smooth`](http://docs.ggplot2.org/current/geom_smooth.html) as the defaults use data from the past and future to perform the smoothing (instead should use a trailing window such as exponential smoothing).
(Side note: if anybody has some good code to make `geom_smooth` perform exponential smoothing in all cases, including grouping and facets I would really like a copy. Right now I have to join in smoothed data as new column as I have never completely grocked all of the implementation interface requirements for new `ggplot2` statistics in their full production complexity.)
If all that seems complicated, scary, unpleasant and technical: that is the right way to think. Markets are not safe, simple, or pleasant. They can be reasoned about and worked with, but it is wrong to think they are simple or easy.
An (unfortunately) more complicated (and slightly less legible graph) is needed to try and faithfully present the information. Since asks and bids are good until withdrawn and revised we render then with a step shape (such as generated by `ggplot2::geom_step`) and since trades happen only at a single time (and are not a promise going forward) we render them with points. Such a graph is given below.
```{r plotstep1}
ggplot() +
geom_step(data=quotes,mapping=aes(x=quoteTime,y=askPrice),
linetype=2,color='#1b9e77',alpha=0.5) +
geom_step(data=quotes,mapping=aes(x=quoteTime,y=bidPrice),
linetype=2,color='#d95f02',alpha=0.5) +
geom_point(data=trades,mapping=(aes(x=tradeTime,y=tradePrice))) +
ylab('price') + xlab('time') + scale_y_log10(breaks=breaks)
```
The step functions propagate flat lines forward from quote revisions, correctly indicating what ask price and bid price were in effect at all times. Trades are shown as dots since they have no propagation. Each item drawn on the graph at a given time was actually know by that time (so a person or trading strategy would also have access to such information at that time).
Trades that occur nearer the ask price can be considered "buyer initiated" and trades that occur near the bid price are considered can be considered "seller initiated", which we can indicate through color.
```{r plotstep2}
mids <- (lastKnownValue(NA,quotes$quoteTime,quotes$askPrice,trades$tradeTime)+
lastKnownValue(NA,quotes$quoteTime,quotes$bidPrice,trades$tradeTime))/2
trades$type <- ifelse(trades$tradePrice>=mids,'buy','sell')
ggplot() +
geom_step(data=quotes,mapping=aes(x=quoteTime,y=askPrice),
linetype=2,color='#1b9e77',alpha=0.5) +
geom_step(data=quotes,mapping=aes(x=quoteTime,y=bidPrice),
linetype=2,color='#d95f02',alpha=0.5) +
geom_point(data=trades,mapping=(aes(x=tradeTime,y=tradePrice,color=type))) +
ylab('price') + xlab('time') + scale_y_log10(breaks=breaks) +
scale_color_brewer(palette = 'Dark2')
```
This is a good time to point out a problem in these graphs. We are mostly plotting times when the market is closed. Most of the space is wasted. In the graph below we indicate (fictitious) market hours by shading the "market open hours" to illustrate the issue.
```{r plot2}
print(openClose)
openClose %>% select(date,time,what) %>% spread(what,time) -> marketHours
ggplot() +
geom_step(data=quotes,mapping=aes(x=quoteTime,y=askPrice),
linetype=2,color='#1b9e77',alpha=0.5) +
geom_step(data=quotes,mapping=aes(x=quoteTime,y=bidPrice),
linetype=2,color='#d95f02',alpha=0.5) +
geom_point(data=trades,mapping=(aes(x=tradeTime,y=tradePrice,color=type))) +
geom_rect(data=marketHours,
mapping=aes(xmin=open,xmax=close,ymin=0,ymax=Inf),
fill='blue',alpha=0.3) +
ylab('price') + xlab('time') + scale_y_log10(breaks=breaks) +
scale_color_brewer(palette = 'Dark2')
```
The easiest way to fix this in `ggplot2` would be to use `facet_wrap`, but this crashes (at least for `ggplot2` version `2.1.0` current on Cran 2016-06-03) with
the very cryptic error message as shown below.
```{r ploterror, error=TRUE}
ggplot() +
geom_step(data=quotes,mapping=aes(x=quoteTime,y=askPrice),
linetype=2,color='#1b9e77',alpha=0.5) +
geom_step(data=quotes,mapping=aes(x=quoteTime,y=bidPrice),
linetype=2,color='#d95f02',alpha=0.5) +
geom_point(data=trades,mapping=(aes(x=tradeTime,y=tradePrice,color=type))) +
facet_wrap(~date,scale='free_x') +
ylab('price') + xlab('time') + scale_color_brewer(palette = 'Dark2')
```
Despite the message "invalid line type" the error is not the user's selection of linetype.
It is easier to see what is going on if we replace `geom_step` with `geom_line` as we show below.
```{r plotlineq, error=TRUE}
ggplot() +
geom_line(data=quotes,mapping=aes(x=quoteTime,y=askPrice),
linetype=2,color='#1b9e77',alpha=0.5) +
geom_line(data=quotes,mapping=aes(x=quoteTime,y=bidPrice),
linetype=2,color='#d95f02',alpha=0.5) +
geom_point(data=trades,mapping=(aes(x=tradeTime,y=tradePrice,color=type))) +
facet_wrap(~date,scale='free_x') +
ylab('price') + xlab('time') + scale_y_log10(breaks=breaks) +
scale_color_brewer(palette = 'Dark2')
```
The above graph is now using sloped lines to connect ask price and bid price revisions (given the false
impression that these intermediate prices were ever available and essentially "leaking information from the future" into the visual presentation). However, we get a graph and a more reasonable warning message: "geom_path: Each group consists of only one observation." There was only one quote revision on 2016-01-05 so as `facet_wrap` treats each facet as sub-graph (and not as a portal into a single larger graph): days with fewer than 2 quote revisions have trouble drawing paths. The trouble causes the (deceptive) blank facet for 2016-01-05 if we are using simple sloped lines (`geom_line`) and seems to error out on the more complicated `geom_step`.
In my opinion `geom_step` should "fail a bit gentler" on this example (as <code>geom_line</code> already does). In any case the correct domain specific fix is to regularize the data a bit by adding market open and close information. In many markets the open and closing prices are set by specific mechanisms (such as an opening auction and a closing volume or time weighted average). For our example we will just use last known price (which we have already prepared).
```{r plotfixed}
openClose %>% mutate(quoteTime=time) %>%
bind_rows(quotes) %>%
arrange(time) %>%
select(date,askPrice,bidPrice,quoteTime) -> joinedData
ggplot() +
geom_step(data=joinedData,mapping=aes(x=quoteTime,y=askPrice),
linetype=2,color='#1b9e77',alpha=0.5) +
geom_step(data=joinedData,mapping=aes(x=quoteTime,y=bidPrice),
linetype=2,color='#d95f02',alpha=0.5) +
geom_point(data=trades,mapping=(aes(x=tradeTime,y=tradePrice,color=type))) +
facet_wrap(~date,scale='free_x') +
ylab('price') + xlab('time') + scale_y_log10(breaks=breaks) +
scale_color_brewer(palette = 'Dark2')
```
The above graph is pretty good. In fact easily producing a graph like this in R using [`dygraphs`](https://github.com/rstudio/dygraphs) is [currently an open issue](https://github.com/rstudio/dygraphs/issues/70).
Morty Proxy This is a proxified and sanitized view of the page, visit original site.