Thursday, November 12, 2015

Keep Track of Training and Testing Loss When Using Caffe

When using Caffe to train a deep neural network, how to record the training/testing loss or accuracy, throughout iterations?

I know the following works, though without understanding the details.

Run in terminal:

 $ caffe_root/build/tools/caffe train --solver=solver.prototxt 2>&1 | tee mylog.log  

Notice the appended part in red, which will log the information shown in the terminal to "mylog.log"

Then run in terminal,

 $ python caffe_root/tools/extra/parse_log.py mylog.log ./  

Now you will see under ./, there are two files, mylog.log.train, and mylog.log.test. They are two csv files that record training and testing loss.

Then you could run gnuplot to quickly visualize the loss/accuracy during iterations. But first you need to comment the 1st line of the two files by #

The .train (or .test) file is of the following form:

#NumIters,Seconds,LearningRate,accuracy,loss
0,****,****,****,****
100,****,****,****,****
.
.
.

Then in terminal run,
 $ gnuplot  
 gnuplot> set datafile separator ','  
 gnuplot> plot 'mylog.log.train' using 1:4 with line # accuracy throughout the iterations  
 gnuplot> plot 'mylog.log.train' using 1:5 with line # loss throughout the iterations  

4 comments:

  1. Thanks man, this is exactly what I've been looking for, you wrapped everything up so well!

    ReplyDelete
  2. Hi Jiaji I try to run the script you mentioned but it resulted error:

    warning: Skipping data file with no valid points
    ^
    x range is invalid

    Do you have any solutions?

    Thanks

    ReplyDelete
    Replies
    1. Hello YuaN, Good day :)
      I had the same warning message and figured out that tabbing the columns and arranging in a specific format like this helps this way.

      0.0, 2.154145, 0.001, 2.30202
      100.0, 7.46129, 0.001, 1.72144
      200.0, 12.813951, 0.001, 1.64644
      300.0, 18.197415, 0.001, 1.34903
      400.0, 23.545648, 0.001, 1.2009
      500.0, 31.0961, 0.001, 1.26333
      600.0, 36.51152, 0.001, 1.34443
      700.0, 41.869304, 0.001, 1.16988

      instead if this:

      0.0,2.154145,0.001,2.30202
      100.0,7.46129,0.001,1.72144
      200.0,12.813951,0.001,1.64644
      300.0,18.197415,0.001,1.34903
      400.0,23.545648,0.001,1.2009
      500.0,31.0961,0.001,1.26333
      600.0,36.51152,0.001,1.34443
      700.0,41.869304,0.001,1.16988

      Delete
    2. Not sure why the tabs don't appear in comments but hope you get what I mean.

      Delete