Skip to content

Commit 1c90c07

Browse files
committed
Update readme
1 parent 1cbedaf commit 1c90c07

File tree

1 file changed

+39
-12
lines changed

1 file changed

+39
-12
lines changed

README.md

Lines changed: 39 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,18 @@ This is based on my [ImageCaptioning.pytorch](https://github.com/ruotianluo/Imag
1313

1414
## Requirements
1515
Python 2.7 (because there is no [coco-caption](https://github.com/tylin/coco-caption) version for python 3)
16-
PyTorch 1.0 (along with torchvision)
16+
PyTorch 1.3 (along with torchvision)
1717
cider (already been added as a submodule)
18+
coco-caption (already been added as a submodule)
19+
yacs
1820

1921
(**Skip if you are using bottom-up feature**): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from [here](https://drive.google.com/open?id=0B7fNdx_jAqhtbVYzOURMdDNHSGM), and should be placed in `data/imagenet_weights`.
2022

21-
## Pretrained models (using resnet101 feature)
22-
Pretrained models are provided [here](https://drive.google.com/open?id=0B7fNdx_jAqhtdE1JRXpmeGJudTg). And the performances of each model will be maintained in this [issue](https://github.com/ruotianluo/neuraltalk2.pytorch/issues/10).
23+
## Pretrained models
2324

24-
If you want to do evaluation only, you can then follow [this section](#generate-image-captions) after downloading the pretrained models (and also the pretrained resnet101).
25+
Checkout `MODEL_ZOO.md`.
26+
27+
If you want to do evaluation only, you can then follow [this section](#generate-image-captions) after downloading the pretrained models (and also the pretrained resnet101 or precomputed bottomup features).
2528

2629
## Train your own network on COCO/Flickr30k
2730

@@ -32,22 +35,30 @@ We now support both flickr30k and COCO. See details in `data/README.md`. (Note:
3235
### Start training
3336

3437
```bash
35-
$ python train.py --id fc --caption_model fc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30
38+
$ python train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30
39+
```
40+
41+
or
42+
43+
```bash
44+
$ python train.py --cfg configs/fc.yml --id fc
3645
```
3746

38-
The train script will dump checkpoints into the folder specified by `--checkpoint_path` (default = `save/`). We only save the best-performing checkpoint on validation and the latest checkpoint to save disk space.
47+
The train script will dump checkpoints into the folder specified by `--checkpoint_path` (default = `log_$id/`). By default only save the best-performing checkpoint on validation and the latest checkpoint to save disk space. You can also set `--save_history_ckpt` to 1 to save every checkpoint.
3948

4049
To resume training, you can specify `--start_from` option to be the path saving `infos.pkl` and `model.pth` (usually you could just set `--start_from` and `--checkpoint_path` to be the same).
4150

42-
If you have tensorflow, the loss histories are automatically dumped into `--checkpoint_path`, and can be visualized using tensorboard.
51+
To checkout the training curve or validation curve, you can use tensorboard. The loss histories are automatically dumped into `--checkpoint_path`.
52+
53+
The current command use scheduled sampling, you can also set `--scheduled_sampling_start` to -1 to turn off scheduled sampling.
4354

44-
The current command use scheduled sampling, you can also set scheduled_sampling_start to -1 to turn off scheduled sampling.
55+
If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use `--language_eval 1` option, but don't forget to pull the submodule `coco-caption`.
4556

46-
If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use `--language_eval 1` option, but don't forget to download the [coco-caption code](https://github.com/tylin/coco-caption) into `coco-caption` directory.
57+
For all the arguments, you can specify them in a yaml file and use `--cfg` to use the configurations in that yaml file. The configurations in command line will overwrite cfg file if there are conflicts.
4758

4859
For more options, see `opts.py`.
4960

50-
**A few notes on training.** To give you an idea, with the default settings one epoch of MS COCO images is about 11000 iterations. After 1 epoch of training results in validation loss ~2.5 and CIDEr score of ~0.68. By iteration 60,000 CIDEr climbs up to about ~0.84 (validation loss at about 2.4 (under scheduled sampling)).
61+
<!-- **A few notes on training.** To give you an idea, with the default settings one epoch of MS COCO images is about 11000 iterations. After 1 epoch of training results in validation loss ~2.5 and CIDEr score of ~0.68. By iteration 60,000 CIDEr climbs up to about ~0.84 (validation loss at about 2.4 (under scheduled sampling)). -->
5162

5263
### Train using self critical
5364

@@ -63,9 +74,15 @@ $ bash scripts/copy_model.sh fc fc_rl
6374

6475
Then
6576
```bash
66-
$ python train.py --id fc_rl --caption_model fc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-5 --start_from log_fc_rl --checkpoint_path log_fc_rl --save_checkpoint_every 6000 --language_eval 1 --val_images_use 5000 --self_critical_after 30 --cached_tokens coco-train-idxs
77+
$ python train.py --id fc_rl --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-5 --start_from log_fc_rl --checkpoint_path log_fc_rl --save_checkpoint_every 6000 --language_eval 1 --val_images_use 5000 --self_critical_after 30 --cached_tokens coco-train-idxs --max_epoch 50
78+
```
79+
80+
or
81+
```bash
82+
$ python train.py --cfg configs/fc_rl.yml --id fc_rl
6783
```
6884

85+
6986
You will see a huge boost on Cider score, : ).
7087

7188
**A few notes on training.** Starting self-critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining).
@@ -75,6 +92,8 @@ You will see a huge boost on Cider score, : ).
7592
## Generate image captions
7693

7794
### Evaluate on raw images
95+
96+
**Note**: this doesn't work for models trained with bottomup feature.
7897
Now place all your images of interest into a folder, e.g. `blah`, and run
7998
the eval script:
8099

@@ -101,8 +120,16 @@ The defualt split to evaluate is test. The default inference method is greedy de
101120

102121
**Beam Search**. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use `--beam_size N`, N should be greater than 1.
103122

123+
### Evaluate on COCO test set
124+
125+
```bash
126+
$ python eval.py --input_json cocotest.json --input_fc_dir data/cocotest_bu_fc --input_att_dir data/cocotest_bu_att --input_label_h5 none --num_images -1 --model model.pth --infos_path infos.pkl --language_eval 0
127+
```
128+
129+
You can download the preprocessed file `cocotest.json`, `cocotest_bu_att` and `cocotest_bu_fc` from [link](https://drive.google.com/open?id=1eCdz62FAVCGogOuNhy87Nmlo5_I0sH2J).
130+
104131
## Miscellanea
105-
**Using cpu**. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpu to train the model.
132+
**Using cpu**. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpus to train the model.
106133

107134
**Train on other dataset**. It should be trivial to port if you can create a file like `dataset_coco.json` for your own dataset.
108135

0 commit comments

Comments
 (0)