converge/pkg/server/ui/about.templ

214 lines
12 KiB
Plaintext

package ui
templ About() {
<div>
<h1>about</h1>
<p>
Converge is a utility for troubleshooting builds on continuous integration servers.
It solves a common problem where the cause of job failure is difficult to determine.
This is complicated further by the fact that build jobs are usually run on a build
farm where there is no access to the build agents or in more modern envrionments when
jobs are run in ephemeral containers.
</p>
<p>
With Converge it is possible to get remote shell access to such jobs. This works
by configuring the build job to connect to a Converge server using an agent program.
The agent program can be downloaded from within the CI job using curl or wget.
Next, an end-user can connect to the Converge server, a rendez-vous server, that connects
the client and server together based on a common identifier specified by both client and
server.
</p>
<h2>how it works</h2>
<p>
The basic principle of converge is described below. Access to a running remote continous integration
job is usually not possible without a lot of access to the backend environment where jobs are running.
However, the job can connect to a server running outside, and so can the client.
</p>
<div>
<img src="../static/images/converge.svg" style="max-width: 800px"/>
</div>
The connection between
client and agent is established as follows:
<ul>
<li>(1): the agent, started by the continuous integration job, connects to converge server through a websocket, this establishes a connection that
is similar to a TCP connection. In connecting, the agent specifies a
rendez-vous id. After connecting, the agenta and Converge server perform multiplexing of connections
over this single connection. This allows the agent to run an embedded SSH server and listen for incoming
connections, just like normally is done with a TCP listener. </li>
<li>(2): the client connects to converge server through SSH and also specifies the same rendez-vous id.
Since SSH by itself cannot connect over websockets, a helper program <code>wsproxy</code> is used as
a proxy command for SSH. Using <code>wsproxy</code>, the rendez-vous id is passed to the server as part
of the websocket URL. </li>
<li>(3): converge server connects the two connections after matching them based on the rendez-vous id.
Now when a connection is setup from a client, it can connect to the appropriate agent, identifie dby
rendez-vous id and setup a bi-directional connection. After this, Converge simply copies data between
client and agent. </li>
<li>(4): the agent runs an embedded SSH server and incoming connections to the agent are handed over to
that server. That server in turn spawns a shell (bash, cmd.exe, powershell.exe) and connects it to the
remote SSH session. At that momenmt an end-to-end SSH session is established and the user can perform interactive
commands. Here any shell program can be used such as bash, command prompt, or powershell. </li>
</ul>
<p>There are a few special situations:
<ul>
<li> If no rendez-vous id is specified than a rendez-vous id is generated. </li>
<li> If the agent uses an id already in use by another agent, then converge server will
generate a new rendez-vous id. </li>
</ul>
The agent will always print the rendez-vous id and command required to connect to it.
</p>
<h2>security</h2>
<p>
The setup is such that the connection from client (end-user) to server (agent on CI job)
is end-to-end encrypted. The Converge server itself is no more than a bitpipe which pumps
data between client and agent.
</p>
<p>Using authorized keys is a secure way of connecting. When running the agent, the authorized keys
must be put in a file, allowing only the designated users to connect. The file containing authorized keys
can also be edited during a session with the agent, allowing more people to be added when required without
having to start over again.
Using authorized keys is made easy through the
<a href="usage.html">usage</a> page, which provides the exact commands to execute based
on the target environment. If users are hesitant to use their public key it is also possible
to generate a separate ssh key-pair using <code>ssh-keygen</code> and use that instead.
</p>
<p>To be able to use Converge, you must already have access to the configuration of a build job.
Having that access means it is possible to execute any command on a build agent. The Converge
agent is started by the build job and does not have any additional rights compared to what you
could script in the continous integration job definition.
</p>
<p>Converge does not provide any stealth features to hide it. The public sessions page show all
agents and clients including details about the clients and the agents. The idea is that it should
be light-weight and easy to use. There is no reason to hide the fact that someone is debugging
a continuous integration job. Also, all sessions are logged,both using standard kubernetes tooling
such as (fluentbit/filebeat, and loki/elasticsearch depending on the environment). This logging includes
only the details about the sessions, but not what the user is doing inside a session. Also, Converge
provides a prometheus metrics endpoint which allows user sessions to be tracked over time after
the fact. Thie data is also made accessible using a grafana dashboard.
</p>
<h2>SSH and SFTP</h2>
<p>
Both ssh and sftp are supported. Multiple concurrent sessions to same agent are allowed as well
as multiple agents are also allowed.
</p>
<h2>timeouts</h2>
<p>
There is a timeout mechanism in the agent such that jobs do not hang indefinitely
waiting for a connection. This mechanism is useful to make sure build agents do not keep
build agents occupied for a long time. By default, the agent exits with status 0 when
the last client exits after logging in. The timeout is an inactivity timeout. Activity is
detected as follows:
<ul>
<li><b>ssh</b>: any key press is considered activity</li>
<li><b>sftp</b>: any output from the server side is considered activity. This is done to
make sure that longer downloads cannot be killed by a timeout. A simple <code>ls</code> command
in an sftp session will also lead to activity since the server will output the result of the command. </li>
</ul>
</p>
<p>When the user touches a .hold file, the agent keeps waiting for connections even
after the last client logs out, taking into account the timeout. By default the agent
exits when the last user has logged out.
</p>
<h2>remote shell usage</h2>
<p>
The agent supports a <code>--shells</code> command-line option by which a comma-separated
list of shells can be prepended to the default search path for shells, e.g.
<code>--shells zsh,csh,sh</code> (linux) or <code>cmd,powershell</code> for
windows.
</p>
<p>
The agent sets an <code>agentdir</code> environment variable that points to
the directory where the agent is running.
</p>
<p>The user will get notifications from the agent any time something important happens such
as the session being close to timeout.
</p>
<h2>other tools</h2>
<p>Using available existing tools such as
<a href="https://github.com/namespacelabs/breakpoint">breakpoint</a> in combination
with a websocket tunneling tool such as
<a href="https://github.com/erebe/wstunnel">wstunnel</a> a similar solution can be
obtained. There are however some problems with these solutions that converge is
trying to address:
</p>
<p>
<ul>
<li>Breakpoint uses an embedded SSH server which is a really good idea but
uses the QUIC protocol for connecting to a rendez-vous server. The rendez-vous server then
exposes a random port for every client. This make deployment on kubernetes really hard
where fixed ports must be used and QUIC is also not a widely supported protocol.</li>
<li>The problem with the random ports can be solved by using wstunnel running together
with breakpoint server in a kubernetes pod, where wstunnel can forward traffic over an
external websocket connection to the local random port that breakpoint server is listening on.</li>
<li>breakpoint leaves it open on how users install the breakpoint executable (agent). </li>
<li>Because of the hacky nature of this setup, it is very difficult for users to use
and troubleshoot when things go wrong. </li>
</ul>
</p>
Converve server addresses these issues in the following ways:
<ul>
<li>Use the websocket protocol both for agents and for clients, providing a fixed port and
a supported protocol for kubernetes deployment. Websockets are also supported by
kubernetes ingress controllers so this makes it easy to deploy on kubernetes.
To make this work with SSH which does not natively support websockets, a proxycommand
<code>wsproxy</code> is provided that allows SSH to connect using websockets.
</li>
<li>Providing online documentation where the instructions take into account the
hostname and protocol where converge is running allowing users to cut and paste
instructions that can be used without modification. In the usage page the users
can even generate the correct agent startup commands and client connection commands
based on the type of shell they are connecting to. </li>
<li>Converge server provides out of the box downloads of required software. This makes sure
client and server are always up to date and can be downloaded in any continuous integration
job without having to package the required executables in an ad-hoc way.
In addition a protocol version check is done. </li>
<li>User-friendly error messages can be given to users in most cases when things do not work
out because of <code>wsproxy</code>. This is an SSH proxy command that communicates with converge
and provides additional information to the user. </li>
<li>A live screen showing the current sessions that are running. The sessions webpage provides
additional feedback about the running sessions. </li>
<li>Interactivity in the user's session with notifications about timeouts and a very
simple inactivity timeout mechanism. </li>
<li>Possibility for the user to define the remote shell to use. </li>
<li>Support for unix like bash shells and command prompt and powershell. </li>
<li>Observability w.r.t. non-functionals of converge and of agent and client sessions through
prometheus monitoring. For session monitoring, separate grafana dashboard is provided. </li>
</ul>
<p>
</p>
</div>
}
templ AboutTab() {
@BasePage(1) {
@About()
}
}