基于Python实现正弦、分段、复数、超复数位置编码，自注意力机制和互注意力机制（源码+说明文档）.rar资源-CSDN下载

共380个文件

py：188个

pyc：80个

csv：20个

版权申诉

python

39 浏览量 2023-03-27 21:00:08 上传评论 2 收藏 136.5MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于Python实现正弦、分段、复数、超复数位置编码，自注意力机制和互注意力机制（源码+说明文档）.rar （380个子文件）

train.en.atok 4.31MB

train.de.atok 2.06MB

train.en.atok 1.75MB

train.de.atok 1.12MB

val.en.atok 882KB

test.en.atok 879KB

val.de.atok 228KB

test.de.atok 228KB

val.de.atok 76KB

test.de.atok 70KB

val.en.atok 63KB

test.en.atok 62KB

GoogleNews-vectors-negative300.bin 0B

aquaint+wiki.txt.gz.ndim=50.bin 0B

GoogleNews-vectors-negative300.bin 0B

aquaint+wiki.txt.gz.ndim=50.bin 0B

complex.code-workspace 43B

train.csv 3.63MB

train.csv 1.11MB

train.csv 1.07MB

train.csv 719KB

train.csv 319KB

train.csv 286KB

train.csv 279KB

test.csv 190KB

test.csv 189KB

dev.csv 147KB

test.csv 127KB

test.csv 125KB

dev.csv 93KB

dev.csv 37KB

test.csv 19KB

dev.csv 14KB

dev.csv 13KB

dev.csv 725B

train.de 2.01MB

train.de 1.04MB

val.de 212KB

test.de 212KB

val.de 74KB

test.de 69KB

dev_cr20190924191535_para 417B

train.en 4.13MB

train.en 1.72MB

val.en 846KB

test.en 843KB

val.en 62KB

test.en 61KB

验证集精度.fig 104KB

.gitignore 47B

.gitignore 38B

Transformer.iml 453B

complex-order-master-tensorrflow.iml 441B

Transformer.iml 428B

八种位置编码方法对比RMSE.iml 284B

magnify.m 4KB

huituduibi.m 2KB

zhuxingtu.m 561B

README.md 3KB

README.md 726B

README.md 296B

model 65.53MB

neg 657KB

rt-polarity.neg 620KB

neg 602KB

rt-polarity.neg 602KB

neg 155KB

neg 139KB

multi-bleu.perl 5KB

rt-polarity.pos 657KB

pos 620KB

pos 615KB

rt-polarity.pos 615KB

pos 223KB

pos 59KB

multi30k.atok.low.pt 3.61MB

complex_mem_transformer.py 37KB

mem_transformer.py 33KB

mem_transformer.py 30KB

train.py 25KB

train.py 24KB

new_relative_encoding.py 19KB

MobileNet.py 19KB

helper.py 17KB

helper.py 15KB

共 380 条

## Introduction This directory contains our pytorch implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our pytorch codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts: - `*large.sh` are for the SoTA setting with large models which might not be directly runnable on a local GPU machine. - `*base.sh` are for the base models which can be run on a few GPUs. The pytorch implementation produces similar results to the TF codebase under the same settings in our preliminary experiments. ## Prerequisite - Pytorch 0.4: `conda install pytorch torchvision -c pytorch` ## Data Prepration `bash getdata.sh` ## Training and Evaluation #### Replicate the "bpc = 1.06" result on `enwik8` with a 12-layer Transformer-XL - Make sure the machine have **4 GPUs**, each with **at least 11G memory** - Training `bash run_enwik8_base.sh train --work_dir PATH_TO_WORK_DIR` - Evaluation `bash run_enwik8_base.sh eval --work_dir PATH_TO_WORK_DIR` #### Replicate the "PPL = 24.03" result on `wikitext-103` with Transformer-XL - Make sure the machine have **4 GPUs**, each with **at least 11G memory** - Training `bash run_wt103_base.sh train --work_dir PATH_TO_WORK_DIR` - Evaluation `bash run_wt103_base.sh eval --work_dir PATH_TO_WORK_DIR` #### Other options: - `--batch_chunk`: this option allows one to trade speed for memory. For `batch_chunk > 1`, the program will split each training batch into `batch_chunk` sub-batches and perform forward and backward on each sub-batch sequentially, with the gradient accumulated and divided by `batch_chunk`. Hence, the memory usage will propertionally lower while the computation time will inversely higher. - `--div_val`: when using adaptive softmax and embedding, the embedding dimension is divided by `div_val` from bin $i$ to bin $i+1$. This saves both GPU memory and the parameter budget. - `--fp16` and `--dynamic-loss-scale`: Run in pseudo-fp16 mode (fp16 storage fp32 math) with dynamic loss scaling. - Note: to explore the `--fp16` option, please make sure the `apex` package is installed (https://github.com/NVIDIA/apex/). - To see performance without the recurrence mechanism, simply use `mem_len=0` in all your scripts. - To see performance of a standard Transformer without relative positional encodings or recurrence mechanisms, use `attn_type=2` and `mem_len=0`. #### Other datasets: - `Text8` character-level language modeling: check out `run_text8_base.sh` - `lm1b` word-level language modeling: check out `run_lm1b_base.sh`

评论收藏

内容反馈

版权申诉