In Transformer architectures, which component is essential for capturing long-range dependencies in sequences?
RNN cells
Max Pooling Layer
Self-Attention Layer
CNN layers

Deep Learning Architectures Exercises are loading ...