Skip to content

Commit 8024c33

Browse files
committed
Rewrite the readme.
1 parent fdf9fde commit 8024c33

File tree

1 file changed

+50
-32
lines changed

1 file changed

+50
-32
lines changed

README.md

Lines changed: 50 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,69 +1,77 @@
11
# Neuraltalk2-pytorch
22

3-
There's something difference compared to neuraltalk2.
4-
- Instead of using random split, we use [karpathy's split](http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip).
3+
Changes compared to neuraltalk2.
4+
- Instead of using random split, we use [karpathy's train-val-test split](http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip).
55
- Instead of including the convnet in the model, we use preprocessed features. (finetuneable cnn version is in the branch **with_finetune**)
6-
- Use resnet101; the same way as in self-critical (the preprocessing code may have bug, haven't tested yet)
6+
- Use resnet instead of vgg; the feature extraction method is the same as in self-critical: run cnn on original image and adaptively average pool the last conv layer feature to fixed size .
7+
- Much more models (you can check out models folder). The latest topdown model can achieve 1.07 Cider score on Karpathy's test split with beam size 5.
78

8-
# Requirements
9-
Python 2.7 (no [coco-caption](https://github.com/tylin/coco-caption) version for python 3), pytorch
9+
## Requirements
10+
Python 2.7 (because there is no [coco-caption](https://github.com/tylin/coco-caption) version for python 3)
11+
PyTorch 0.2 (along with torchvision)
1012

11-
# Pretrained models.
12-
You need pretrained resnet both for training and evaluation. The models can be downloaded from [here](https://drive.google.com/open?id=0B7fNdx_jAqhtbVYzOURMdDNHSGM), and should be placed in `data/imagenet_weights`.
13+
You need to download pretrained resnet model for both training and evaluation. The models can be downloaded from [here](https://drive.google.com/open?id=0B7fNdx_jAqhtbVYzOURMdDNHSGM), and should be placed in `data/imagenet_weights`.
1314

14-
We also provide pretrained fc model, and you can download it from [here](https://drive.google.com/drive/folders/0B7fNdx_jAqhtOVBabHRCQzJ1Skk?usp=sharing).
1515

16-
Then you can follow [this section](#markdown-header-caption-images-after-training).
16+
## Pretrained models
17+
Pretrained models are provided [here](https://drive.google.com/open?id=0B7fNdx_jAqhtcXp0aFlWSnJmb0k). And the performances of each model will be maintained in this [issue](https://github.com/ruotianluo/neuraltalk2.pytorch/issues/10).
1718

18-
# Train your own network on COCO
19-
**(Almost identical to neuraltalk2)**
19+
If you want to do evaluation only, then you can follow [this section](#generate-image-captions) after downloading the pretrained models.
2020

21-
Great, first we need to some preprocessing. Head over to the `coco/` folder and run the IPython notebook to download the dataset and do some very simple preprocessing. The notebook will combine the train/val data together and create a very simple and small json file that contains a large list of image paths, and raw captions for each image, of the form:
21+
## Train your own network on COCO
2222

23-
```
24-
[{ "file_path": "path/img.jpg", "captions": ["a caption", "a second caption of i"tgit ...] }, ...]
25-
```
23+
### Download COCO dataset and preprocessing
24+
25+
First, download the coco images from [link](http://mscoco.org/dataset/#download). We need 2014 training images and 2014 val. images. You should put the `train2014/` and `val2014/` in the same directory, denoted as `$IMAGE_ROOT`.
2626

27-
Once we have this, we're ready to invoke the `prepro_*.py` script, which will read all of this in and create a dataset (several hdf5 files and a json file). For example, for MS COCO we can run the prepro file as follows:
27+
Download preprocessed coco captions from [link](http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip) from Karpathy's homepage. Extract `dataset_coco.json` from the zip file and copy it in to `data/`. This file provides preprocessed captions and also standard train-val-test splits.
28+
29+
Once we have these, we can now invoke the `prepro_*.py` script, which will read all of this in and create a dataset (two feature folders, a hdf5 label file and a json file).
2830

2931
```bash
30-
$ python scripts/prepro_labels.py --input_json .../dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk
31-
$ python scripts/prepro_feats_npy.py --input_json .../dataset_coco.json --output_dir data/cocotalk --images_root ...
32+
$ python scripts/prepro_labels.py --input_json data/dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk
33+
$ python scripts/prepro_feats.py --input_json data/dataset_coco.json --output_dir data/cocotalk --images_root $IMAGE_ROOT
3234
```
3335

34-
You need to download [dataset_coco.json](http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip) from Karpathy's homepage.
36+
`prepro_labels.py` will map all words that occur <= 5 times to a special `UNK` token, and create a vocabulary for all the remaining words. The image information and vocabulary are dumped into `data/cocotalk.json` and discretized caption data are dumped into `data/cocotalk_label.h5`.
37+
38+
`prepro_feats.py` extract the resnet101 features (both fc feature and last conv feature) of each image. The features are saved in `data/cocotalk_fc` and `data/cocotalk_att`, and resulting files are about 200GB.
3539

36-
This is telling the script to read in all the data (the images and the captions), allocate the images to different splits according to the split json file, extract the resnet101 features (both fc feature and last conv feature) of each image, and map all words that occur <= 5 times to a special `UNK` token. The resulting `json` and `h5` files are about 200GB and contain everything we want to know about the dataset.
40+
(Check the prepro scripts for more options, like other resnet models or other attention sizes.)
3741

3842
**Warning**: the prepro script will fail with the default MSCOCO data because one of their images is corrupted. See [this issue](https://github.com/karpathy/neuraltalk2/issues/4) for the fix, it involves manually replacing one image in the dataset.
3943

40-
**(Copy end.)**
44+
### Start training
4145

4246
```bash
43-
$ python train.py --input_json coco/cocotalk.json --input_json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --beam_size 1 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --save_checkpoint_every 6000 --val_images_use 5000
47+
$ python train.py --id st --caption_model show_tell --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_st --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 25
4448
```
4549

46-
The train script will take over, and start dumping checkpoints into the folder specified by `checkpoint_path` (default = current folder). For more options, see `opts.py`.
50+
The train script will dump checkpoints into the folder specified by `--checkpoint_path` (default = `save/`). We only save the best-performing checkpoint on validation and the latest checkpoint to save disk space.
51+
52+
To resume training, you can specify `--start_from` option to be the path saving `infos.pkl` and `model.pth` (usually you could just set `--start_from` and `--checkpoint_path` to be the same).
4753

48-
If you have tensorflow, the loss histories are automatically dumped into checkpoint_path, and can be visualized using tensorboard.
54+
If you have tensorflow, the loss histories are automatically dumped into `--checkpoint_path`, and can be visualized using tensorboard.
4955

5056
The current command use scheduled sampling, you can also set scheduled_sampling_start to -1 to turn off scheduled sampling.
5157

5258
If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use `--language_eval 1` option, but don't forget to download the [coco-caption code](https://github.com/tylin/coco-caption) into `coco-caption` directory.
5359

54-
**A few notes on training.** To give you an idea, with the default settings one epoch of MS COCO images is about 7500 iterations. 1 epoch of training (with no finetuning - notice this is the default) takes about 15 minutes and results in validation loss ~2.7 and CIDEr score of ~0.5. ~~By iteration 50,000 CIDEr climbs up to about 0.65 (validation loss at about 2.4).~~
60+
For more options, see `opts.py`.
5561

56-
### Caption images after training
62+
**A few notes on training.** To give you an idea, with the default settings one epoch of MS COCO images is about 11000 iterations. After 1 epoch of training results in validation loss ~2.5 and CIDEr score of ~0.68. By iteration 60,000 CIDEr climbs up to about ~0.84 (validation loss at about 2.4 (under scheduled sampling)).
5763

58-
## Evaluate on raw images(not ready yet)
64+
## Generate image captions
65+
66+
### Evaluate on raw images
5967
Now place all your images of interest into a folder, e.g. `blah`, and run
6068
the eval script:
6169

6270
```bash
63-
$ python eval.py --model model.pth --infos_path infos_<id>.pkl --image_folder <image_folder> --num_images 10
71+
$ python eval.py --model model.pth --infos_path infos.pkl --image_folder blah --num_images 10
6472
```
6573

66-
This tells the `eval` script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing `batch_size` (default = 1). Use `-num_images -1` to process all images. The eval script will create an `vis.json` file inside the `vis` folder, which can then be visualized with the provided HTML interface:
74+
This tells the `eval` script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing `batch_size`. Use `--num_images -1` to process all images. The eval script will create an `vis.json` file inside the `vis` folder, which can then be visualized with the provided HTML interface:
6775

6876
```bash
6977
$ cd vis
@@ -72,13 +80,23 @@ $ python -m SimpleHTTPServer
7280

7381
Now visit `localhost:8000` in your browser and you should see your predicted captions.
7482

75-
## Evaluate on test split of coco dataset
83+
### Evaluate on Karpathy's test split
7684

7785
```bash
78-
$ python eval.py --dump_images 0 --num_images 5000 --model model.pth --language_eval 1 --infos_path infos_<id>.pkl
86+
$ python eval.py --dump_images 0 --num_images 5000 --model model.pth --infos_path infos.pkl --language_eval 1
7987
```
8088

8189
The defualt split to evaluate is test. The default inference method is greedy decoding (`--sample_max 1`), to sample from the posterior, set `--sample_max 0`.
8290

83-
**Beam Search**. Beam search can increase the performance of the search for argmax decoding sequence. However, this is a little more expensive. To turn on the beam search, use `--beam_size N`, N should be greater than 1.
91+
**Beam Search**. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use `--beam_size N`, N should be greater than 1.
92+
93+
## Miscellanea
94+
**Using cpu**. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpu to train the model.
95+
96+
**Train on other dataset**. It should be trivial to port if you can create a file like `dataset_coco.json` for your own dataset.
97+
98+
**Live demo**. Not supported now. Welcome pull request.
99+
100+
## Acknowledgements
84101

102+
Thanks the original [neuraltalk2](https://github.com/karpathy/neuraltalk2) and awesome PyTorch team.

0 commit comments

Comments
 (0)