-
Notifications
You must be signed in to change notification settings - Fork 31.6k
run_squad with roberta #2173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_squad with roberta #2173
Conversation
This reverts commit 22e7c4e.
# Conflicts: # transformers/data/processors/squad.py
Codecov Report
@@ Coverage Diff @@
## master #2173 +/- ##
==========================================
- Coverage 80.79% 79.43% -1.36%
==========================================
Files 113 113
Lines 17013 17067 +54
==========================================
- Hits 13745 13558 -187
- Misses 3268 3509 +241
Continue to review full report at Codecov.
|
| sequence_added_tokens = tokenizer.max_len - tokenizer.max_len_single_sentence + 1 \ | ||
| if 'roberta' in str(type(tokenizer)) else tokenizer.max_len - tokenizer.max_len_single_sentence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! We'll eventually have to think of an abstraction so that this method stays tokenizer-agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. That would be better.
transformers/modeling_roberta.py
Outdated
| tokenizer = RobertaTokenizer.from_pretrained('roberta-base') | ||
| model = RobertaForMultipleChoice.from_pretrained('roberta-base') | ||
| input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1 | ||
| start_positions = torch.tensor([1]) | ||
| end_positions = torch.tensor([3]) | ||
| outputs = model(input_ids, start_positions=start_positions, end_positions=end_positions) | ||
| loss, start_scores, end_scores = outputs[:2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update this to a similar example to that of BertForQuestionAnswering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the usage example. Could you please help with the failed check? I just add some comments but it failed. I also did python -m pytest transformers/tests/modeling_roberta_test.py and all tests are passed. Thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's currently an error with the test due to a segmentation fault. I'm fixing it on #2207, don't worry about it here.
|
Really nice job! |
|
Really nice, thanks a lot @erenup |
Hi, @julien-c @thomwolf this PR is based on #1386 and #1984.
This PR modified run_squad.py and models_roberta to support Roberta.
This PR also made use of multiple processing to accelerate converting examples to features. (Converting examples to feature needed 15minus before and 34 seconds now with 24 cpu cores' acceleration. The threads number is 1 by default which is the same as the original single processing's speed).
The result of Roberta large on squad1.1:
{'exact': 87.26584673604542, 'f1': 93.77663586186483, 'total': 10570, 'HasAns_exact': 87.26584673604542, 'HasAns_f1': 93.77663586186483, 'HasAns_total': 10570, 'best_exact': 87.26584673604542, 'best_exact_thresh': 0.0, 'best_f1': 93.77663586186483, 'best_f1_thresh': 0.0}, which is sighltly lower than Add RoBERTa question answering & Update SQuAD runner to support RoBERTa #1386 in a single run.Parameters are
python ./examples/run_squad.py --model_type roberta --model_name_or_path roberta-large --do_train --do_eval --do_lower_case \ --train_file data/squad1/train-v1.1.json --predict_file data/squad1/dev-v1.1.json --learning_rate 1.5e-5 --num_train_epochs 2 --max_seq_length 384 --doc_stride 128 --output_dir ./models_roberta/large_squad1 --per_gpu_eval_batch_size=3 --per_gpu_train_batch_size=3 --save_steps 10000 --warmup_steps=500 --weight_decay=0.01. Hope this gap can be improved by `add_prefix_space=true' . I will do this comparasion in the next days.The result of Roberta base is '{'exact': 80.65279091769158, 'f1': 88.57296806525736, 'total': 10570, 'HasAns_exact': 80.65279091769158, 'HasAns_f1': 88.57296806525736, 'HasAns_total': 10570, 'best_exact': 80.65279091769158, 'best_exact_thresh': 0.0, 'best_f1': 88.57296806525736, 'best_f1_thresh': 0.0}'. Roberta-base was also tested since it's more easy to be reproduced.
The results of bert-base-uncased is `{'exact': 79.21475875118259, 'f1': 87.13734938098504, 'total': 10570, 'HasAns_exact': 79.21475875118259, 'HasAns_f1': 87.13734938098504, 'HasAns_total': 10570, 'best_exact': 79.21475875118259, 'best_exact_thresh': 0.0, 'best_f1': 87.13734938098504, 'best_f1_thresh': 0.0}'. This is tested for the multiple processing's influence on other models. This result is the same with bert-base-uncased result without multiple processing.
Hope that someone else can help to reproduce my results. thank you! I will continue to find if three is some ways to improve the roberta-large's results.
Squad1 model on google drive roberta-large-finetuned-squad: