run_squad with roberta #2173

erenup · 2019-12-14T01:16:59Z

Hi, @julien-c @thomwolf this PR is based on #1386 and #1984.

This PR modified run_squad.py and models_roberta to support Roberta.
This PR also made use of multiple processing to accelerate converting examples to features. (Converting examples to feature needed 15minus before and 34 seconds now with 24 cpu cores' acceleration. The threads number is 1 by default which is the same as the original single processing's speed).
The result of Roberta large on squad1.1:
{'exact': 87.26584673604542, 'f1': 93.77663586186483, 'total': 10570, 'HasAns_exact': 87.26584673604542, 'HasAns_f1': 93.77663586186483, 'HasAns_total': 10570, 'best_exact': 87.26584673604542, 'best_exact_thresh': 0.0, 'best_f1': 93.77663586186483, 'best_f1_thresh': 0.0}, which is sighltly lower than Add RoBERTa question answering & Update SQuAD runner to support RoBERTa #1386 in a single run.
Parameters are python ./examples/run_squad.py --model_type roberta --model_name_or_path roberta-large --do_train --do_eval --do_lower_case \ --train_file data/squad1/train-v1.1.json --predict_file data/squad1/dev-v1.1.json --learning_rate 1.5e-5 --num_train_epochs 2 --max_seq_length 384 --doc_stride 128 --output_dir ./models_roberta/large_squad1 --per_gpu_eval_batch_size=3 --per_gpu_train_batch_size=3 --save_steps 10000 --warmup_steps=500 --weight_decay=0.01. Hope this gap can be improved by `add_prefix_space=true' . I will do this comparasion in the next days.
The result of Roberta base is '{'exact': 80.65279091769158, 'f1': 88.57296806525736, 'total': 10570, 'HasAns_exact': 80.65279091769158, 'HasAns_f1': 88.57296806525736, 'HasAns_total': 10570, 'best_exact': 80.65279091769158, 'best_exact_thresh': 0.0, 'best_f1': 88.57296806525736, 'best_f1_thresh': 0.0}'. Roberta-base was also tested since it's more easy to be reproduced.
The results of bert-base-uncased is `{'exact': 79.21475875118259, 'f1': 87.13734938098504, 'total': 10570, 'HasAns_exact': 79.21475875118259, 'HasAns_f1': 87.13734938098504, 'HasAns_total': 10570, 'best_exact': 79.21475875118259, 'best_exact_thresh': 0.0, 'best_f1': 87.13734938098504, 'best_f1_thresh': 0.0}'. This is tested for the multiple processing's influence on other models. This result is the same with bert-base-uncased result without multiple processing.
Hope that someone else can help to reproduce my results. thank you! I will continue to find if three is some ways to improve the roberta-large's results.
Squad1 model on google drive roberta-large-finetuned-squad:

This reverts commit 22e7c4e.

# Conflicts: # transformers/data/processors/squad.py

codecov-io · 2019-12-14T01:24:01Z

Codecov Report

Merging #2173 into master will decrease coverage by 1.35%.
The diff coverage is 9.09%.

@@            Coverage Diff             @@
##           master    #2173      +/-   ##
==========================================
- Coverage   80.79%   79.43%   -1.36%     
==========================================
  Files         113      113              
  Lines       17013    17067      +54     
==========================================
- Hits        13745    13558     -187     
- Misses       3268     3509     +241

Impacted Files	Coverage Δ
transformers/data/metrics/squad_metrics.py	`0% <0%> (ø)`	⬆️
transformers/modeling_roberta.py	`53.2% <21.21%> (-18.57%)`	⬇️
transformers/data/processors/squad.py	`14.75% <5.5%> (+0.56%)`	⬆️
transformers/modeling_tf_pytorch_utils.py	`9.72% <0%> (-85.42%)`	⬇️
transformers/tests/modeling_tf_common_test.py	`80.51% <0%> (-16.42%)`	⬇️
transformers/modeling_xlnet.py	`72.21% <0%> (-2.33%)`	⬇️
transformers/modeling_ctrl.py	`94.27% <0%> (-2.21%)`	⬇️
transformers/modeling_openai.py	`80.13% <0%> (-1.33%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7bd11dd...805c21a. Read the comment docs.

LysandreJik · 2019-12-16T20:07:16Z

transformers/data/processors/squad.py

+    sequence_added_tokens = tokenizer.max_len - tokenizer.max_len_single_sentence + 1 \
+        if 'roberta' in str(type(tokenizer)) else tokenizer.max_len - tokenizer.max_len_single_sentence


Good catch! We'll eventually have to think of an abstraction so that this method stays tokenizer-agnostic.

Cool. That would be better.

LysandreJik · 2019-12-16T20:14:32Z

transformers/modeling_roberta.py

+        tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
+        model = RobertaForMultipleChoice.from_pretrained('roberta-base')
+        input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0)  # Batch size 1
+        start_positions = torch.tensor([1])
+        end_positions = torch.tensor([3])
+        outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions)
+        loss, start_scores, end_scores = outputs[:2]


We should update this to a similar example to that of BertForQuestionAnswering

I have updated the usage example. Could you please help with the failed check? I just add some comments but it failed. I also did python -m pytest transformers/tests/modeling_roberta_test.py and all tests are passed. Thank you very much.

There's currently an error with the test due to a segmentation fault. I'm fixing it on #2207, don't worry about it here.

dayihengliu · 2019-12-17T10:31:39Z

Really nice job!
Here are my results of RoBERTa-large on SQuAD using this PR:
Results: {'exact': 84.52792049187232, 'f1': 88.0216698977779, 'total': 11873, 'HasAns_exact': 80.90418353576248, 'HasAns_f1': 87.9017015344667, 'HasAns_total': 5928, 'NoAns_exact': 88.1412952060555, 'NoAns_f1': 88.1412952060555, 'NoAns_total': 5945, 'best_exact': 84.52792049187232, 'best_exact_thresh': 0.0, 'best_f1': 88.02166989777776, 'best_f1_thresh': 0.0}
The hyper-parameters are as follows:
python ./examples/run_squad.py \ --model_type roberta \ --model_name_or_path roberta-large \ --do_train \ --do_eval \ --do_lower_case \ --train_file data/squad2/train-v2.0.json \ --predict_file data/squad2/dev-v2.0.json \ --learning_rate 2e-5 \ --num_train_epochs 2 \ --max_seq_length 384 \ --doc_stride 128 \ --output_dir ./models_roberta/large_squad2 \ --per_gpu_eval_batch_size=6 \ --per_gpu_train_batch_size=6 \ --save_steps 10000 --warmup_steps=500 --weight_decay=0.01 --overwrite_cache --overwrite_output_dir --threads 24 --version_2_with_negative

thomwolf · 2019-12-21T13:33:07Z

Really nice, thanks a lot @erenup

erenup added 9 commits October 3, 2019 18:33

fixing for roberta tokenizer decoding

22e7c4e

Revert "fixing for roberta tokenizer decoding"

b5d7397

This reverts commit 22e7c4e.

Merge branch 'huggingface/master'

86a6307

Merge remote-tracking branch 'refs/remotes/huggingface/master'

40ed717

initial version for roberta squad

9b312f9

add multiple processing

8e9526b

Merge remote-tracking branch 'refs/remotes/huggingface/master'

76f0d99

Merge branch 'refs/heads/squad_roberta'

c778070

# Conflicts: # transformers/data/processors/squad.py

deleted useless file

a1faaf9

erenup mentioned this pull request Dec 14, 2019

Add RoBERTa question answering & Update SQuAD runner to support RoBERTa #1386

Closed

LysandreJik reviewed Dec 16, 2019

View reviewed changes

erenup added 3 commits December 17, 2019 11:18

updated usage example in modeling_roberta for question and answering

3c6efd0

add comment for example_index and unique_id in single process

d000195

tried to fix the failed checks

805c21a

julien-c changed the title ~~run_squa with roberta~~ run_squad with roberta Dec 20, 2019

thomwolf merged commit 18601c3 into huggingface:master Dec 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

run_squad with roberta #2173

run_squad with roberta #2173

erenup commented Dec 14, 2019 •

edited

Loading

Uh oh!

codecov-io commented Dec 14, 2019 •

edited

Loading

Uh oh!

LysandreJik Dec 16, 2019

Uh oh!

erenup Dec 17, 2019

Uh oh!

LysandreJik Dec 16, 2019

Uh oh!

erenup Dec 17, 2019

Uh oh!

LysandreJik Dec 17, 2019

Uh oh!

dayihengliu commented Dec 17, 2019 •

edited

Loading

Uh oh!

thomwolf commented Dec 21, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		sequence_added_tokens = tokenizer.max_len - tokenizer.max_len_single_sentence + 1 \
		if 'roberta' in str(type(tokenizer)) else tokenizer.max_len - tokenizer.max_len_single_sentence

run_squad with roberta #2173

run_squad with roberta #2173

Conversation

erenup commented Dec 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Dec 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LysandreJik Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

erenup Dec 17, 2019

Choose a reason for hiding this comment

Uh oh!

LysandreJik Dec 16, 2019

Choose a reason for hiding this comment

Uh oh!

erenup Dec 17, 2019

Choose a reason for hiding this comment

Uh oh!

LysandreJik Dec 17, 2019

Choose a reason for hiding this comment

Uh oh!

dayihengliu commented Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomwolf commented Dec 21, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

erenup commented Dec 14, 2019 •

edited

Loading

codecov-io commented Dec 14, 2019 •

edited

Loading

dayihengliu commented Dec 17, 2019 •

edited

Loading