Graceful Shutdown or Restart

Kesa...大约 7 分钟golangdockerendlessgin

在学习 gin 时,官方文档介绍如何优雅地启动和停止服务

这里以 endlessopen in new window 为例,开始学习

1. Introduction

Zero downtime restarts for golang HTTP and HTTPS servers.

1.1 Features

  • Drop-in replacement for http.ListenAndServe and http.ListenAndServeTLS
  • Signal hooks to execute your own code before or after the listened to signal (SIGHUP, SIGUSR1, SIGUER2, SIGINT, SIGTERM, SIGSTP)
  • You can start multiple servers from one binary and endless will take care of the different sockets/ports assignments when restarting

2. Quick Start

go get -u github.com/fvbock/endless
package main

import (
	"github.com/fvbock/endless"
	"github.com/gin-gonic/gin"
	"log"
	"net/http"
)

func main(){
	router := gin.Default()
	router.GET("ping", func(c *gin.Context) {
		c.String(http.StatusOK,"pong")
	})
	err := endless.ListenAndServe(":9090",router)
	if err != nil {
		log.Fatal("listen and serve error:",err.Error())
	}
	log.Println("Server on 9090 stopped")
}

编译并启动程序:

$ go build -o ./main ./main.go
$ ./main
# ... some gin logs
[GIN-debug] GET    /ping                     --> main.main.func1 (3 handlers)
2021/11/23 20:04:53 31409 :9090

简单测试下:

$ curl "http://localhost:9090/ping"
pong

发送 SIGHUP 信号

$ kill -1 31409
---------
# ... some logs
2021/11/23 20:13:35 32215 :9090
2021/11/23 20:13:35 31409 Received SIGTERM.
2021/11/23 20:13:35 31409 Waiting for connections to finish...
2021/11/23 20:13:35 31409 Serve() returning...
2021/11/23 20:13:35 listen and serve error:accept tcp [::]:9090: use of closed network connection

​ 再次发送请求,可以看到程序依然能够接收请求

$ curl "http://0.0.0.0:9090/ping"
pong
------
[GIN] 2021/11/24 - 16:05:39 | 200 |      12.848µs |       127.0.0.1 | GET      "/ping"

3. Timeout and MaxHeaderBytes

There are three variables exported by the package that control the values set for DefaultReadTimeOut,DefaultWriteTimeOut, and MaxHeaderBytes on the innter http.server

DefaultReadTimeOut time.Duration
DefaultWriteTimeOut time.Duration
DefaultMaxHeaderBytes int

The endless default behaviour is to use the same defaults defined in net/http

These hava impact on endless by potentially not letting the parent process die until all connections are handled/finished

查看 endlessServer 源码:

type endlessServer struct {
	http.Server
	EndlessListener  net.Listener
	SignalHooks      map[int]map[os.Signal][]func()
	tlsInnerListener *endlessListener
	wg               sync.WaitGroup
	sigChan          chan os.Signal
	isChild          bool
	state            uint8
	lock             *sync.RWMutex
	BeforeBegin      func(add string)
}

endless.ListenAndServe

func ListenAndServe(addr string, handler http.Handler) error {
	server := NewServer(addr, handler)
	return server.ListenAndServe()
}

endless.NewServer:

func NewServer(addr string, handler http.Handler) (srv *endlessServer) {
// ...
	srv.Server.Addr = addr
	srv.Server.ReadTimeout = DefaultReadTimeOut
	srv.Server.WriteTimeout = DefaultWriteTimeOut
	srv.Server.MaxHeaderBytes = DefaultMaxHeaderBytes
	srv.Server.Handler = handler
// ...
}

这里在创建 endlessServer 时,会使用默认的配置,当然也可以在创建完之后单独设置

简单示例:

package main

import (
	"log"
	"net/http"
	"strconv"
	"time"

	"github.com/fvbock/endless"
	"github.com/gin-gonic/gin"
)

func main() {
	router := gin.Default()
	router.GET("ping", func(c *gin.Context) {
		time.Sleep(3 * time.Second)
		seq, _ := strconv.Atoi(c.Query("seq"))
		c.String(http.StatusOK, "pong-%d", seq)
	})
	server := endless.NewServer(":9090", router)
	server.ReadTimeout = 1
	server.WriteTimeout = 1
	server.MaxHeaderBytes = 8 << 20
	err := server.ListenAndServe()
	if err != nil {
		log.Fatal("listen and serve error:", err.Error())
	}
}

4. Hammer Time

To deal with hanging request on the parent after restarting endless will hammer the parent 60 seconds after receiving the shutdown signal from the forked child process. When hammered still running requests get terminated. This behaviour can be controlled by another exported variable:

DefaultHammerTime time.Duration

The default is 60 seconds. When set to -1 hammerTime() is not invoked automatically. You can then hammer the parent manually by sending SIGUSR2. This will only hammer the parent if it is already in shutdown mode. So unless the process had received a SIGTERM, SIGSTOP, or SIGINT(manually or by forking) brefore SIGUSR2 will be ignored

If you had hanging requests and the server got hammered you will see a log message like this :

2015/04/04 13:04:10 [STOP - Hammer Time] Forcefully shutting down parent

示例:

package main

import (
	"log"
	"net/http"
	"strconv"
	"time"

	"github.com/fvbock/endless"
	"github.com/gin-gonic/gin"
)

func main() {
	router := gin.Default()
	router.GET("ping", func(c *gin.Context) {
		time.Sleep(3 * time.Second)
		seq, _ := strconv.Atoi(c.Query("seq"))
		c.String(http.StatusOK, "pong-%d", seq)
	})
	endless.DefaultHammerTime = 1 * time.Second
	server := endless.NewServer(":9090", router)
	err := server.ListenAndServe()
	if err != nil {
		log.Fatal("listen and serve error:", err.Error())
	}
}

上例中将 hammer time 设置为 1s , 交易处理时间模拟为 3s

这里使用脚本来模拟多次请求:

#!/usr/bin/env sh
echo Start sending request
for((i=1;i<6;i++))
do
        url="http://0.0.0.0:9090/ping?seq=${i}"
        echo GET $url
        curl $url
        echo 
done

启动服务并执行脚本,在第三次请求时发送挂起信号kill -1

脚本输出:

Start sending request
GET http://0.0.0.0:9090/ping?seq=1
pong-1
GET http://0.0.0.0:9090/ping?seq=2
pong-2
GET http://0.0.0.0:9090/ping?seq=3
pong-3
GET http://0.0.0.0:9090/ping?seq=4

GET http://0.0.0.0:9090/ping?seq=5
pong-5

可以看到第四次请求是没有响应的,同时看日志这边

[GIN] 2021/11/24 - 18:04:09 | 200 |  3.000673064s |       127.0.0.1 | GET      "/ping?seq=1"
[GIN] 2021/11/24 - 18:04:12 | 200 |  3.000330231s |       127.0.0.1 | GET      "/ping?seq=2"
[GIN] 2021/11/24 - 18:04:15 | 200 |  3.001050857s |       127.0.0.1 | GET      "/ping?seq=3"
2021/11/24 18:04:17 14681 Received SIGHUP. forking.
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] GET    /ping                     --> main.main.func1 (3 handlers)
2021/11/24 18:04:17 14711 :9090
2021/11/24 18:04:17 14681 Received SIGTERM.
2021/11/24 18:04:17 14681 Waiting for connections to finish...
2021/11/24 18:04:17 14681 [::]:9090 Listener closed.
2021/11/24 18:04:18 [STOP - Hammer Time] Forcefully shutting down parent
2021/11/24 18:04:18 14681 Serve() returning...
2021/11/24 18:04:18 listen and serve error:accept tcp [::]:9090: use of closed network connection
[GIN] 2021/11/24 - 18:04:21 | 200 |   3.00127917s |       127.0.0.1 | GET      "/ping?seq=5"

可以看到,第4次请求没有被处理完毕,父进程就已经终止了(hammer time < deal time)

5. Signals

The endless server will listen for the following signals:

  • syscall.SIGHUP: will trigger a fork/restart
  • syscall.SIGINT and syscall.SIGTERM will trigger a shutdown of the server (it will finish running request)
  • SIGUSR2: will trigger hammer Time
  • SIGUSR1 and SIGTSTP are listened for but do not trigger anything in the endless server itself.

You can hook your own functions to be called pre or post signal handling

5.1 Hook

package main

import (
	"log"
	"net/http"
	"strconv"
	"syscall"
	"time"

	"github.com/fvbock/endless"
	"github.com/gin-gonic/gin"
)

func preSigUsr1() {
	log.Println("pre SIGUSR1")
}

func postSigUsr1() {
	log.Println("pre SIGUSR1")
}

func main() {
	router := gin.Default()
	router.GET("ping", func(c *gin.Context) {
		time.Sleep(3 * time.Second)
		seq, _ := strconv.Atoi(c.Query("seq"))
		c.String(http.StatusOK, "pong-%d", seq)
	})
	server := endless.NewServer(":9090", router)
	server.SignalHooks[endless.PRE_SIGNAL][syscall.SIGUSR1] = append(server.SignalHooks[endless.PRE_SIGNAL][syscall.SIGUSR1], preSigUsr1)
	server.SignalHooks[endless.POST_SIGNAL][syscall.SIGUSR1] = append(server.SignalHooks[endless.POST_SIGNAL][syscall.SIGUSR1], postSigUsr1)
	err := server.ListenAndServe()
	if err != nil {
		log.Fatal("listen and serve error:", err.Error())
	}
}

Run and test:

2021/11/25 09:29:51 18649 :9090
------
kill -s SIGUSR1
------
2021/11/25 09:30:23 pre SIGUSR1
2021/11/25 09:30:23 18649 Received SIGUSR1.
2021/11/25 09:30:23 pre SIGUSR1

上例中使用了append来注册钩子函数,这里的SignalHooks实际上是:

type endlessServer struct {
	http.Server
    // ...
	SignalHooks      map[int]map[os.Signal][]func()
	// ...	
}
  • SignalHooks[int]: key 有两种取值,endless.PRE_SIGNALendless.POST_SIGNAL分别表示前置钩子和后置钩子
  • SignalHooks[int][os.Signal][]func(): 可以为某个信号类型 (os.Signal) 注册过个钩子函数

再回到 server.ListenAndServe 函数,看下钩子函数是如何触发的:

func (srv *endlessServer) ListenAndServe() (err error) {
	addr := srv.Addr
	if addr == "" {
		addr = ":http"
	}

	go srv.handleSignals()

	l, err := srv.getListener(addr)
	if err != nil {
		log.Println(err)
		return
	}

	srv.EndlessListener = newEndlessListener(l, srv)

	if srv.isChild {
		syscall.Kill(syscall.Getppid(), syscall.SIGTERM)
	}

	srv.BeforeBegin(srv.Addr)

	return srv.Serve()
}
func (srv *endlessServer) handleSignals() {
	var sig os.Signal

	signal.Notify(
		srv.sigChan,
		hookableSignals...,
	)

	pid := syscall.Getpid()
	for {
		sig = <-srv.sigChan
		srv.signalHooks(PRE_SIGNAL, sig)
		switch sig {
		case syscall.SIGHUP:
			log.Println(pid, "Received SIGHUP. forking.")
			err := srv.fork()
			if err != nil {
				log.Println("Fork err:", err)
			}
		case syscall.SIGUSR1:
			log.Println(pid, "Received SIGUSR1.")
		case syscall.SIGUSR2:
			log.Println(pid, "Received SIGUSR2.")
			srv.hammerTime(0 * time.Second)
		case syscall.SIGINT:
			log.Println(pid, "Received SIGINT.")
			srv.shutdown()
		case syscall.SIGTERM:
			log.Println(pid, "Received SIGTERM.")
			srv.shutdown()
		case syscall.SIGTSTP:
			log.Println(pid, "Received SIGTSTP.")
		default:
			log.Printf("Received %v: nothing i care about...\n", sig)
		}
		srv.signalHooks(POST_SIGNAL, sig)
	}
}
// ...
func (srv *endlessServer) signalHooks(ppFlag int, sig os.Signal) {
	if _, notSet := srv.SignalHooks[ppFlag][sig]; !notSet {
		return
	}
	for _, f := range srv.SignalHooks[ppFlag][sig] {
		f()
	}
	return
}

可以看到 endless 使用了一个 goroutine 取处理信号的钩子函数, 并会按顺序执行注册的钩子函数

5.2 Kill

Linux 的 kill 指令用于将信号发给进程,所有的信号可以由kill -l 列出:

$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL
 5) SIGTRAP      6) SIGABRT      7) SIGBUS       8) SIGFPE
 9) SIGKILL     10) SIGUSR1     11) SIGSEGV     12) SIGUSR2
13) SIGPIPE     14) SIGALRM     15) SIGTERM     16) SIGSTKFLT
17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU
25) SIGXFSZ     26) SIGVTALRM   27) SIGPROF     28) SIGWINCH
29) SIGIO       30) SIGPWR      31) SIGSYS      34) SIGRTMIN
35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3  38) SIGRTMIN+4
39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12
47) SIGRTMIN+13 48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14
51) SIGRTMAX-13 52) SIGRTMAX-12 53) SIGRTMAX-11 54) SIGRTMAX-10
55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7  58) SIGRTMAX-6
59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX

其中常用的有:

SignalDescription
HUP终端挂断
INT中断(同 Ctrl + C)
QUIT退出(同 Ctrl + \)
KILL强制终止
TERM终止
CONT继续 (与STOP相反,fg/bg命令)
STOP暂停(同 Ctrl + Z)

信号是 Unix 、类 Unix 以及其他 POSIX 兼容的操作系统中进程间通讯的一种有限制的方式

它是一种异步的通知机制,用来提醒进程一个事件(硬件异常、程序执行异常、外部发出信号)已经发生。当一个信号发送给一个进程,操作系统中断了进程正常的控制流程。此时,任何非原子操作都将被中断。如果进程定义了信号的处理函数,那么它将被执行,否则就执行默认的处理函数

6. PID File

If you want to save actual pid file, you can change the BeforeBegin hook like this:

server := endless.NewServer("localhost:4242", handler)
server.BeforeBegin = func(add string) {
	log.Printf("Actual pid is %d", syscall.Getpid())
	// save it somehow
}
err := server.ListenAndServe()

endlessServer.ListenAndServe 方法中可以看到,在启动服务之前会调用此函数

func (srv *endlessServer) ListenAndServe() (err error) {
	// ...
	srv.BeforeBegin(srv.Addr)

	return srv.Serve()
}

Reference

  1. endlessopen in new window go doc
  2. killopen in new window
  3. 优雅重启服务open in new window 煎鱼
上次编辑于:
评论
  • 按正序
  • 按倒序
  • 按热度
Powered by Waline v2.15.2