Short-term load forecasting (STLF) is an important task for the stable and economic operation of power systems. However, the existing STLF methods are incapable of fitting the time series and nonlinear characteristics of load data simultaneously or cannot take into account the different influences from various input features on the predicted load values, so the improvement of the accuracy in STLF is limited seriously. To address these problems, an optimized STLF model called Attention-GRU is proposed in this paper. The proposed model not only employs gated recurrent unit (GRU) to accommodate the time series and nonlinear characteristics of load data, but also highlights the critical features through attention mechanism. By using an actual dataset from Australia to implement experiments, the results show that the proposed model outperforms the baseline models based on back propagation (BP) neural network, long short-term memory (LSTM) and GRU in term of forecasting accuracy.