background

Niko's Project Corner

Real-time car tracking and counting

Description Tracking and counting cars for automatic traffic statistics
Languages Matlab
Tags Com­puter Vi­sion
FFT
Duration Summer 2014
Modified 7th June 2014
thumbnail

From my of­fice win­dow I've got an un­blocked size-view to the Ring Road I (Kehä I) in Es­poo, Fin­land. It is one of the bus­iest roads in Fin­land, hav­ing up-to 100.000 cars / day. I wanted to cre­ate a pro­gram which would re­ceive a video feed from a we­bcam and would pro­cess im­ages in real time on com­mon hard­ware.

Ob­ject track­ing is fairly well stud­ied prob­lem al­ready, but I wanted to take ad­van­tage of the spe­cial na­ture of this prob­lem. Namely cars are known to move in a purely hor­izon­tal di­rec­tion, and I didn't want to have com­plex back­ground learn­ing and sep­ara­tion code. I also didn't need to know cars lo­ca­tions pre­cisely, but the main in­ter­est was on the num­ber of cars and their ve­loc­ities.

car_tracking
Figure 1: An ex­am­ple re­sult of de­tect­ing and track­ing cars from a side view. Green ver­ti­cal lines in­di­cate cars' es­ti­mated move­ment be­tween ad­ja­cent frames.

The fi­nal out­put of the al­go­rithm is vi­su­al­ized in Fig­ure 1. Video frames are con­verted to gray-scale and high-pass fil­tered (the re­sult can be seen in Fig­ure 3). Then ad­ja­cent frames are sub­tracted from each other, and this is used to de­tect im­age lo­ca­tions which had sig­nif­icant changes in their bright­nesses. These points are then used to gen­er­ate a De­lau­nay tri­an­gu­la­tion. Too big tri­an­gles are re­moved from the mesh, and re­sult­ing dis­con­nected sub-graphs would ide­ally cover a sin­gle car.

This pro­cess can merge cars to­gether if the dis­tance be­tween them is too small, as seen in Fig­ure 2, Luck­ily this prob­lem can be mit­igated in data post-pro­cess­ing via stan­dard sta­tis­ti­cal meth­ods and tracker sta­bil­ity anal­ysis. Ad­di­tion­ally too big re­gions can be ig­nored all-to­gether. Points and graphs are plot­ted in blue in Fig­ure 1, and red boxes bound sep­arate "re­gions of in­ter­est" (ROIs). These are then in­di­vid­ually an­alyzed.

car_tracking_problem
Figure 2: A prob­lem­atic video frame with a bus counted as three sep­arate cars, and a few cars are merged to­gether. The bus prob­lem wouldn't oc­cur with a bet­ter back­ground sep­ara­tion al­go­rithm.
car_motion
Figure 3: A sin­gle car de­tec­tion and ve­loc­ity mea­sure­ment ex­am­ple. The dif­fer­ence be­tween frames is vi­su­al­ized in lower left cor­ner, and red points in­di­cate lo­cal dif­fer­ence max­ima. The im­age on the right vi­su­al­izes the cross-cor­re­la­tion for dif­fer­ent im­age rows and off­sets, mid-point is in­di­cated in red and the max­imum cor­re­la­tion is in­di­cated by the green line.

Re­gion of in­ter­est anal­ysis steps are shown in Fig­ure 3. Once a ROI bound­aries are de­ter­mined, the cor­re­spond­ing re­gion is ex­tracted from cur­rent and pre­vi­ous video frames. Pre­vi­ously gen­er­ated im­ages (con­verted to gray-scale and high-pass fil­tered) are re-used. Then the trans­la­tion be­tween these frames is ef­fi­ciently, ac­cu­rately and ro­bustly de­ter­mined by ap­ply­ing the Phase cor­re­la­tion method in 1D. The Fourier trans­form is cal­cu­lated both im­ages rows, these are mul­ti­plied to­gether el­ement-wise and the in­verse Fourier trans­form is cal­cu­lated. The out­come of this is shown in the right side of Fig­ure 3.

To de­ter­mine the trans­la­tion amount be­tween im­ages, row-wise phase cor­re­la­tions are mul­ti­plied to­gether, and its max­imum value is de­ter­mined. The lo­ca­tion of the max­imum di­rectly de­ter­mi­nes the amount and di­rec­tion of the trans­la­tion. This sig­nal is shown in Fig­ure 4, and it has a sin­gle very dis­tinct peak. This method doesn't rely on any in­ter­est point ex­trac­tion, it isn't fooled by any par­tial repet­itive pat­terns and no vot­ing scheme is needed for ro­bust out­come. In­ter­est points were only used to de­tect and seg­ment in­di­vid­ual cars, not to ac­tu­ally track them.

motion_offset
Figure 4: A column-wise sum­ma­tion of cor­re­la­tion val­ues. It is clear that the mo­tion of this car be­tween two frames is about 24 pix­els. This could be con­verted to km/h by a sim­ple cal­ibra­tion.

Over­all the sys­tem would ben­efit from a bet­ter back­ground model and back­ground sep­ara­tion al­go­rithm, but this sim­ple code worked sur­pris­ingly well and it is only about 200 lines of Mat­lab code. It isn't 100% ac­cu­rate, but most com­mon er­rors can be de­tected and cor­rected by an­alyz­ing its out­put across mul­ti­ple frames to de­tect out­liers and in­cor­rect de­tec­tions. If the traf­fic gets heavy and car speeds drop, then this method would see it as an empty road since there is no move­ment. Also this is­sue would be fixed by bet­ter back­ground sep­ara­tion code.


Related blog posts:

Bananagrams1
StableDiffusionBasics
Puzzles
SpeechMusicSeparation
VideoClustering