Hide keyboard shortcuts

Hot-keys on this page

r m x p   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

1''' Helper functions for writing per-second measurement results to a file that 

2might rotate, as well as classes for reading those results from files later. 

3 

4**Note: The information here is only partially true until pastly/flashflow#4 is 

5implemented and this message is removed.** 

6 

7Results are "logged" via :mod:`logging` at level ``INFO``. It is important that 

8the user does not edit the way these messages are logged. 

9If the user would like to rotate the output file, e.g. with `logrotate 

10<https://linux.die.net/man/8/logrotate>`_, they can do that because by default 

11(*and this should not be changed lightly*) these "log messages" get "logged" 

12via a :class:`logging.handlers.WatchedFileHandler`, which handles this 

13situation gracefully. 

14 

15Usage 

16===== 

17 

18Call :meth:`write_begin` once at the beginning of the active measurement phase. 

19As measurement results come in every second from measurers, call 

20:meth:`write_meas` for each. Likewise for per-second background traffic reports 

21and :meth:`write_bg`. As soon as active measurement is over, call 

22:meth:`write_end`. 

23 

24Output Format 

25============= 

26 

27Output is line based. Multiple measurements can take place simultaneously, in 

28which case per-second results from measurements of different relays can be 

29interleaved. 

30 

31A **BEGIN** line signals the start of data for the measurement of a relay. An 

32**END** line signals the end. Between these lines there are zero or more result 

33lines for the measurement of this relay, each with a per-second result from 

34either a measurer measuring that relay or that relay itself reporting the 

35amount of background traffic it saw that second. 

36 

37BEGIN Line 

38---------- 

39 

40:: 

41 

42 <meas_id> <time> BEGIN <fp> 

43 

44Where: 

45 

46- ``meas_id``: the measurement ID for this measurement 

47- ``time``: the integer unix timestamp at which active measurement began. 

48- ``fp``: the fingerprint of the relay this BEGIN message is for. 

49 

50Example:: 

51 

52 58234 1591979504 BEGIN B0430D21D6609459D141078C0D7758B5CA753B6F 

53 

54END line 

55-------- 

56 

57:: 

58 

59 <meas_id> <time> END 

60 

61Where: 

62 

63- ``meas_id``: the measurement ID for this measurement 

64- ``time``: the integer unix timestamp at which active measurement ended. 

65 

66Example:: 

67 

68 58234 1591979534 END B0430D21D6609459D141078C0D7758B5CA753B6F 

69 

70 

71Results line 

72------------ 

73 

74:: 

75 

76 <meas_id> <time> <is_bg> GIVEN=<given> TRUSTED=<trusted> 

77 

78Where: 

79 

80- ``meas_id``: the measurement ID for this measurement 

81- ``time``: the integer unix timestamp at which this result was received. 

82- ``is_bg``: 'BG' if this result is a report from the relay on the number of 

83 background bytes it saw in the last second, or 'MEASR' if this is a result 

84 from a measurer 

85- ``given``: the number of bytes reported 

86- ``trusted``: if a bg report from the relay, the maximum `given` is trusted to 

87 be; or if a measurer result, then the same as `given`. 

88 

89Both ``given`` and ``trusted`` are in bytes. Yes, for measurer lines it is 

90redundant to specify both. 

91 

92Background traffic reports from the relay include the raw actual reported value 

93in ``given``; if the relay is malicious and claims 8 TiB of background traffic 

94in the last second, you will see that here. ``trusted`` is the **max** that 

95``given`` can be. When reading results from this file, use ``min(given, 

96trusted)`` as the trusted number of background bytes this second. 

97 

98Example:: 

99 

100 # bg report from relay, use GIVEN b/c less than TRUSTED 

101 58234 1591979083 BG GIVEN=744904 TRUSTED=1659029 

102 # bg report from relay, use TRUSTED b/c less than GIVEN 

103 58234 1591979042 BG GIVEN=671858 TRUSTED=50960 

104 # result from measurer, always trusted 

105 58234 1591979083 MEASR GIVEN=5059082 TRUSTED=5059082 

106''' 

107import logging 

108from statistics import median 

109from typing import Optional, List 

110 

111 

112log = logging.getLogger(__name__) 

113 

114 

115def _try_parse_int(s: str) -> Optional[int]: 

116 ''' Try to parse an integer from the given string. If impossible, return 

117 ``None``. ''' 

118 try: 

119 return int(s) 

120 except (ValueError, TypeError): 

121 return None 

122 

123 

124def _ensure_len(lst: List[int], min_len: int): 

125 ''' Ensure that the given list is at least ``min_len`` items long. If it 

126 isn't, append zeros to the right until it is. ''' 

127 if len(lst) < min_len: 

128 lst += [0] * (min_len - len(lst)) 

129 

130 

131class Meas: 

132 ''' Accumulate ``MeasLine*`` objects into a single measurement summary. 

133 

134 The first measurement line you should see is a :class:`MeasLineBegin`; 

135 create a :class:`Meas` object with it. Then pass each :class:`MeasLineData` 

136 that you encounter to either :meth:`Meas.add_measr` or :meth:`Meas.add_bg` 

137 based on where it came from. Finally pass the :class:`MeasLineEnd` to tell 

138 the object it has all the data. 

139 

140 Not much is done to ensure you're using this data storage class correctly. 

141 For example: 

142 

143 - You can add more :class:`MeasLineData` after marking the end. 

144 - You can pass untrusted :class:`MeasLineData` from the relay to the 

145 :meth:`Meas.add_measr` function where they will be treated as 

146 trusted. 

147 - You can get the :meth:`Meas.result` before all data lines have been 

148 given. 

149 - You can provide data from different measurements for different 

150 relays. 

151 

152 **You shouldn't do these things**, but you can. It's up to you to use your 

153 tools as perscribed. 

154 ''' 

155 _begin: 'MeasLineBegin' 

156 _end: Optional['MeasLineEnd'] 

157 _data: List[int] 

158 

159 def __init__(self, begin: 'MeasLineBegin'): 

160 self._begin = begin 

161 self._end = None 

162 self._data = [] 

163 

164 @property 

165 def relay_fp(self) -> str: 

166 ''' The relay measured, as given in the initial :class:`MeasLineBegin`. 

167 ''' 

168 return self._begin.relay_fp 

169 

170 @property 

171 def meas_id(self) -> int: 

172 ''' The measurement ID, as given in the initial :class:`MeasLineBegin`. 

173 ''' 

174 return self._begin.meas_id 

175 

176 @property 

177 def start_ts(self) -> int: 

178 ''' The integer timestamp for when the measurement started, as given in 

179 the initial :class:`MeasLineBegin`. ''' 

180 return self._begin.ts 

181 

182 def _ensure_len(self, data_len: int): 

183 ''' Ensure we can store at least ``data_len`` items, expanding our data 

184 list to the right with zeros as necessary. ''' 

185 if len(self._data) < data_len: 

186 self._data += [0] * (data_len - len(self._data)) 

187 

188 def add_measr(self, data: 'MeasLineData'): 

189 ''' Add a :class:`MeasLineData` to our results that came from a 

190 measurer. 

191 

192 As it came from a measurer, we trust it entirely (and there's no 

193 ``trusted_bw`` member) and simply add it to the appropriate second. 

194 ''' 

195 idx = data.ts - self.start_ts 

196 _ensure_len(self._data, idx + 1) 

197 self._data[idx] += data.given_bw 

198 

199 def add_bg(self, data: 'MeasLineData'): 

200 ''' Add a :class:`MeasLineData` to our results that came from the relay 

201 and is regarding the amount of background traffic. 

202 

203 As it came from the relay, we do not a ``given_bw > trusted_bw``. Thus 

204 we add the minimum of the two to the appropriate second. 

205 ''' 

206 idx = data.ts - self.start_ts 

207 _ensure_len(self._data, idx + 1) 

208 assert data.trusted_bw is not None # for mypy, bg will have this 

209 self._data[idx] += min(data.given_bw, data.trusted_bw) 

210 

211 def set_end(self, end: 'MeasLineEnd'): 

212 ''' Indicate that there is no more data to be loaded into this 

213 :class:`Meas`. ''' 

214 self._end = end 

215 

216 def have_all_data(self) -> bool: 

217 ''' Check if we still expect to be given more data ''' 

218 return self._end is not None 

219 

220 def result(self) -> float: 

221 ''' Calculate and return the result of this measurement ''' 

222 return median(self._data) 

223 

224 

225class MeasLine: 

226 ''' Parent class for other ``MeasLine*`` types. You should only ever need 

227 to interact with this class directly via its :meth:`MeasLine.parse` method. 

228 ''' 

229 def __init__(self, meas_id: int, ts: int): 

230 self.meas_id = meas_id 

231 self.ts = ts 

232 

233 def __str__(self): 

234 return '%d %d' % ( 

235 self.meas_id, 

236 self.ts) 

237 

238 @staticmethod 

239 def parse(s: str) -> Optional['MeasLine']: 

240 ''' Try to parse a MeasLine subclass from the given line ``s``. If 

241 impossible, return ``None``. ''' 

242 s = s.strip() 

243 # ignore comment lines 

244 if s.startswith('#'): 

245 return None 

246 words = s.split() 

247 # minimum line length, in words, is 3: end lines have 3 words 

248 # maximum line length, in words, is 5: bg data lines have 5 

249 MIN_WORD_LEN = 3 

250 MAX_WORD_LEN = 5 

251 if len(words) < MIN_WORD_LEN or len(words) > MAX_WORD_LEN: 

252 return None 

253 # split off the prefix words (words common to all measurement data 

254 # lines). 

255 prefix, words = words[:2], words[2:] 

256 # try convert each one, bail if unable 

257 meas_id = _try_parse_int(prefix[0]) 

258 ts = _try_parse_int(prefix[1]) 

259 if meas_id is None or ts is None: 

260 return None 

261 # now act differently based on what type of line we seem to have 

262 if words[0] == 'BEGIN': 

263 # BEGIN <fp> 

264 if len(words) != 2: 264 ↛ 265line 264 didn't jump to line 265, because the condition on line 264 was never true

265 return None 

266 fp = words[1] 

267 return MeasLineBegin(fp, meas_id, ts) 

268 elif words[0] == 'END': 

269 # END 

270 return MeasLineEnd(meas_id, ts) 

271 elif words[0] == 'MEASR': 

272 # MEASR GIVEN=1234 

273 if len(words) != 2 or _try_parse_int(words[1]) is None: 

274 return None 

275 res = _try_parse_int(words[1]) 

276 assert isinstance(res, int) # for mypy 

277 return MeasLineData(res, None, meas_id, ts) 

278 elif words[0] == 'BG': 

279 # BG GIVEN=1234 TRUSTED=5678 

280 if len(words) != 3 or \ 

281 _try_parse_int(words[1]) is None or \ 

282 _try_parse_int(words[2]) is None: 

283 return None 

284 given = _try_parse_int(words[1]) 

285 trusted = _try_parse_int(words[2]) 

286 assert isinstance(given, int) # for mypy 

287 assert isinstance(trusted, int) # for mypy 

288 return MeasLineData(given, trusted, meas_id, ts) 

289 return None 

290 

291 

292class MeasLineBegin(MeasLine): 

293 def __init__(self, fp: str, *a, **kw): 

294 super().__init__(*a, **kw) 

295 self.relay_fp = fp 

296 

297 def __str__(self): 

298 prefix = super().__str__() 

299 return prefix + ' BEGIN ' + self.relay_fp 

300 

301 

302class MeasLineEnd(MeasLine): 

303 def __init__(self, *a, **kw): 

304 super().__init__(*a, **kw) 

305 

306 def __str__(self): 

307 prefix = super().__str__() 

308 return prefix + ' END' 

309 

310 

311class MeasLineData(MeasLine): 

312 def __init__(self, given_bw: int, trusted_bw: Optional[int], *a, **kw): 

313 super().__init__(*a, **kw) 

314 self.given_bw = given_bw 

315 self.trusted_bw = trusted_bw 

316 

317 def is_bg(self) -> bool: 

318 return self.trusted_bw is not None 

319 

320 def __str__(self): 

321 prefix = super().__str__() 

322 if self.trusted_bw is None: 

323 # result from a measurer 

324 return prefix + ' MEASR %d' % (self.given_bw,) 

325 # result from relay 

326 return prefix + ' BG %d %d' % (self.given_bw, self.trusted_bw) 

327 

328 

329def write_begin(fp: str, meas_id: int, ts: int): 

330 ''' Write a log line indicating the start of the given relay's measurement. 

331 

332 :param fp: the fingerprint of the relay 

333 :param meas_id: the measurement ID 

334 :param ts: the unix timestamp at which the measurement began 

335 ''' 

336 log.info(MeasLineBegin(fp, meas_id, ts)) 

337 

338 

339def write_end(meas_id: int, ts: int): 

340 ''' Write a log line indicating the end of the given relay's measurement. 

341 

342 :param meas_id: the measurement ID 

343 :param ts: the unix timestamp at which the measurement ended 

344 ''' 

345 log.info(MeasLineEnd(meas_id, ts)) 

346 

347 

348def write_meas(meas_id: int, ts: int, res: int): 

349 ''' Write a single per-second result from a measurer to our results. 

350 

351 :param meas_id: the measurement ID 

352 :param ts: the unix timestamp at which the result came in 

353 :param res: the number of measured bytes 

354 ''' 

355 log.info(MeasLineData(res, None, meas_id, ts)) 

356 

357 

358def write_bg(meas_id: int, ts: int, given: int, trusted: int): 

359 ''' Write a single per-second report of bg traffic from the relay to our 

360 results. 

361 

362 :param meas_id: the measurement ID 

363 :param ts: the unix timestamp at which the result came in 

364 :param given: the number of reported bg bytes 

365 :param trusted: the maximum given should be (from our perspective in this 

366 logging code, it's fine if given is bigger than trusted) 

367 ''' 

368 log.info(MeasLineData(given, trusted, meas_id, ts))