You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+39-12Lines changed: 39 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,18 @@ This is based on my [ImageCaptioning.pytorch](https://github.com/ruotianluo/Imag
13
13
14
14
## Requirements
15
15
Python 2.7 (because there is no [coco-caption](https://github.com/tylin/coco-caption) version for python 3)
16
-
PyTorch 1.0 (along with torchvision)
16
+
PyTorch 1.3 (along with torchvision)
17
17
cider (already been added as a submodule)
18
+
coco-caption (already been added as a submodule)
19
+
yacs
18
20
19
21
(**Skip if you are using bottom-up feature**): If you want to use resnet to extract image features, you need to download pretrained resnet model for both training and evaluation. The models can be downloaded from [here](https://drive.google.com/open?id=0B7fNdx_jAqhtbVYzOURMdDNHSGM), and should be placed in `data/imagenet_weights`.
20
22
21
-
## Pretrained models (using resnet101 feature)
22
-
Pretrained models are provided [here](https://drive.google.com/open?id=0B7fNdx_jAqhtdE1JRXpmeGJudTg). And the performances of each model will be maintained in this [issue](https://github.com/ruotianluo/neuraltalk2.pytorch/issues/10).
23
+
## Pretrained models
23
24
24
-
If you want to do evaluation only, you can then follow [this section](#generate-image-captions) after downloading the pretrained models (and also the pretrained resnet101).
25
+
Checkout `MODEL_ZOO.md`.
26
+
27
+
If you want to do evaluation only, you can then follow [this section](#generate-image-captions) after downloading the pretrained models (and also the pretrained resnet101 or precomputed bottomup features).
25
28
26
29
## Train your own network on COCO/Flickr30k
27
30
@@ -32,22 +35,30 @@ We now support both flickr30k and COCO. See details in `data/README.md`. (Note:
The train script will dump checkpoints into the folder specified by `--checkpoint_path` (default = `save/`). We only save the best-performing checkpoint on validation and the latest checkpoint to save disk space.
47
+
The train script will dump checkpoints into the folder specified by `--checkpoint_path` (default = `log_$id/`). By default only save the best-performing checkpoint on validation and the latest checkpoint to save disk space. You can also set `--save_history_ckpt` to 1 to save every checkpoint.
39
48
40
49
To resume training, you can specify `--start_from` option to be the path saving `infos.pkl` and `model.pth` (usually you could just set `--start_from` and `--checkpoint_path` to be the same).
41
50
42
-
If you have tensorflow, the loss histories are automatically dumped into `--checkpoint_path`, and can be visualized using tensorboard.
51
+
To checkout the training curve or validation curve, you can use tensorboard. The loss histories are automatically dumped into `--checkpoint_path`.
52
+
53
+
The current command use scheduled sampling, you can also set `--scheduled_sampling_start` to -1 to turn off scheduled sampling.
43
54
44
-
The current command use scheduled sampling, you can also set scheduled_sampling_start to -1 to turn off scheduled sampling.
55
+
If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use `--language_eval 1` option, but don't forget to pull the submodule `coco-caption`.
45
56
46
-
If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use `--language_eval 1` option, but don't forget to download the [coco-caption code](https://github.com/tylin/coco-caption) into `coco-caption` directory.
57
+
For all the arguments, you can specify them in a yaml file and use `--cfg` to use the configurations in that yaml file. The configurations in command line will overwrite cfg file if there are conflicts.
47
58
48
59
For more options, see `opts.py`.
49
60
50
-
**A few notes on training.** To give you an idea, with the default settings one epoch of MS COCO images is about 11000 iterations. After 1 epoch of training results in validation loss ~2.5 and CIDEr score of ~0.68. By iteration 60,000 CIDEr climbs up to about ~0.84 (validation loss at about 2.4 (under scheduled sampling)).
61
+
<!--**A few notes on training.** To give you an idea, with the default settings one epoch of MS COCO images is about 11000 iterations. After 1 epoch of training results in validation loss ~2.5 and CIDEr score of ~0.68. By iteration 60,000 CIDEr climbs up to about ~0.84 (validation loss at about 2.4 (under scheduled sampling)).-->
51
62
52
63
### Train using self critical
53
64
@@ -63,9 +74,15 @@ $ bash scripts/copy_model.sh fc fc_rl
**A few notes on training.** Starting self-critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining).
@@ -75,6 +92,8 @@ You will see a huge boost on Cider score, : ).
75
92
## Generate image captions
76
93
77
94
### Evaluate on raw images
95
+
96
+
**Note**: this doesn't work for models trained with bottomup feature.
78
97
Now place all your images of interest into a folder, e.g. `blah`, and run
79
98
the eval script:
80
99
@@ -101,8 +120,16 @@ The defualt split to evaluate is test. The default inference method is greedy de
101
120
102
121
**Beam Search**. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use `--beam_size N`, N should be greater than 1.
You can download the preprocessed file `cocotest.json`, `cocotest_bu_att` and `cocotest_bu_fc` from [link](https://drive.google.com/open?id=1eCdz62FAVCGogOuNhy87Nmlo5_I0sH2J).
130
+
104
131
## Miscellanea
105
-
**Using cpu**. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpu to train the model.
132
+
**Using cpu**. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpus to train the model.
106
133
107
134
**Train on other dataset**. It should be trivial to port if you can create a file like `dataset_coco.json` for your own dataset.
0 commit comments