converge/pkg/server/ui/about.templ

210 lines
11 KiB
Plaintext

package ui
templ About() {
<div>
<h1>about</h1>
<p>
Converge is a utility for troubleshooting builds on continuous integration servers.
It solves a common problem where the cause of job failure is difficult to determine.
This is complicated further by the fact that build jobs are usually run on a build
farm where there is no access to the build agents or in more modern envrionments when
jobs are run in ephemeral containers.
</p>
<p>
With Converge it is possible to get remote shell access to such jobs. This works
by configuring the build job to connect to a Converge server using an agent program.
The agent program can be downloaded from within the CI job using curl or wget.
Next, an end-user can connect to the Converge server, a rendez-vous server, that connects
the client and server together based on a common identifier specified by both client and
server.
</p>
<h2>how it works</h2>
<p>
The steps involved are as follows:
<ul>
<li>The agent connects to converge server and specifies an id, the so-called rendez-vous id,
identifying the agent.
The agent outputs an example command that can be used to connect to this agent.
</li>
<li>The agent sets up multiplexing of connections together with converge server
which allows it to listen on incoming connections.
</li>
<li>This is used by the agent for running an embedded SSH server that is listening for
incoming connection requests from clients.
</li>
<li>The client/user connects to the converge server using the command specified by the agent.
This uses the same id as that used by the agent. The converge server can now match these
ids an set up an end-to-end connection from client to agent. The role of converge server
is simply in matching these ids and connecting the two websocket connections (from agent
and from client) together by copying data between them as it arrives.
</li>
<li>The embedded SSH server now performs authentication, after successful login,
a shell is spawned and the session is established. The shell can be any linux
shell but also command prompt and powershell are possible.
The connection is practically identical to a regular terminal connection. To
achieve this, the shell is made to beiieve that it is connected to a
terminal.
</li>
</ul>
</p>
<p>With regards to the rendez-vous id there are the following remarks:
<ul>
<li> If no id is specified than an id is generated. </li>
<li> If the agent uses an id already in use by another agent, then converge server will
generate a new id. </li>
</ul>
The agent will always print the id and command required to connect to it to standard output.
</p>
<h2>security</h2>
<p>
The setup is such that the connection from client (end-user) to server (agent on CI job)
is end-to-end encrypted. The Converge server itself is no more than a bitpipe which pumps
data between client and agent.
</p>
<p>Using authorized keys is a secure way of connecting. When running the agent, the authorized keys
must be put in a file, allowing only the designated users to connect. The file containing authorized keys
can also be edited during a session with the agent, allowing more people to be added when required without
having to start over again.
Using authorized keys is made easy through the
<a href="usage.html">usage</a> page, which provides the exact commands to execute based
on the target environment. If users are hesitant to use their public key it is also possible
to generate a separate ssh key-pair using <code>ssh-keygen</code> and use that instead.
</p>
<p>To be able to use Converge, you must already have access to the configuration of a build job.
Having that access means it is possible to execute any command on a build agent. The Converge
agent is started by the build job and does not have any additional rights compared to what you
could script in the continous integration job definition.
</p>
<p>Converge does not provide any stealth features to hide it. The public sessions page show all
agents and clients including details about the clients and the agents. The idea is that it should
be light-weight and easy to use. There is no reason to hide the fact that someone is debugging
a continuous integration job. Also, all sessions are logged,both using standard kubernetes tooling
such as (fluentbit/filebeat, and loki/elasticsearch depending on the environment). This logging includes
only the details about the sessions, but not what the user is doing inside a session. Also, Converge
provides a prometheus metrics endpoint which allows user sessions to be tracked over time after
the fact. Thie data is also made accessible using a grafana dashboard.
</p>
<h2>SSH and SFTP</h2>
<p>
Both ssh and sftp are supported. Multiple concurrent sessions to same agent are allowed as well
as multiple agents are also allowed.
</p>
<h2>timeouts</h2>
<p>
There is a timeout mechanism in the agent such that jobs do not hang indefinitely
waiting for a connection. This mechanism is useful to make sure build agents do not keep
build agents occupied for a long time. By default, the agent exits with status 0 when
the last client exits after logging in. The timeout is an inactivity timeout. Activity is
detected as follows:
<ul>
<li><b>ssh</b>: any key press is considered activity</li>
<li><b>sftp</b>: any output from the server side is considered activity. This is done to
make sure that longer downloads cannot be killed by a timeout. A simple <code>ls</code> command
in an sftp session will also lead to activity since the server will output the result of the command. </li>
</ul>
</p>
<p>When the user touches a .hold file, the agent keeps waiting for connections even
after the last client logs out, taking into account the timeout. By default the agent
exits when the last user has logged out.
</p>
<h2>remote shell usage</h2>
<p>
The agent supports a --shells command-line option by which a comma-separated
list of shells can be prepended to the default search path for shells, e.g.
<code>--shells zsh,csh,sh</code> (linux) or <code>cmd,powershell</code> for
windows.
</p>
<p>
The agent sets an <coder>agentdir</coder> environment variable that points to
the directory where the agent is running.
</p>
<p>The user will get notifications from the agent any time something important happens such
as the session being close to timeout.
</p>
<h2>other tools</h2>
<p>Using available existing tools such as
<a href="https://github.com/namespacelabs/breakpoint">breakpoint</a> in combination
with a websocket tunneling tool such as
<a href="https://github.com/erebe/wstunnel">wstunnel</a> a similar solution can be
obtained. There are however some problems with these solutions that converge is
trying to address:
</p>
<p>
<ul>
<li>Breakpoint uses an embedded SSH server which is a really good idea but
uses the QUIC protocol for connecting to a rendez-vous server. The rendez-vous server then
exposes a random port for every client. This make deployment on kubernetes really hard
where fixed ports must be used and QUIC is also not a widely supported protocol.</li>
<li>The problem with the random ports can be solved by using wstunnel running together
with breakpoint server in a kubernetes pod, where wstunnel can forward traffic over an
external websocket connection to the local random port that breakpoint server is listening on.</li>
<li>breakpoint leaves it open on how users install the breakpoint executable (agent). </li>
<li>Because of the hacky nature of this setup, it is very difficult for users to use
and troubleshoot when things go wrong. </li>
</ul>
</p>
Converve server addresses these issues in the following ways:
<ul>
<li>Use the websocket protocol both for agents and for clients, providing a fixed port and
a supported protocol for kubernetes deploymment. Websockets are also supported by
kubernetes ingress controllers so this makes it easy to deploy on kubernetes.
To make this work with SSH which does not natively support websockets, a proxycommand
<code>wsproxy</code> is provided that allows SSH to connect using websockets.
</li>
<li>Providing online documentation where the instructions take into account the
hostname and protocol where converge is running allowing users to cut and paste
instructions that can be used without modification. In the usage page the users
can even generate the correct agent startup commands and client connection commands
based on the type of shell they are connecting to. </li>
<li>Converge server provides out of the box downloads of required software. This makes sure
client and server are always up to date and can be downloaded in any continuous integration
job without having to package the required executables in an ad-hoc way.
In addition a protocol version check is done. </li>
<li>User-friendly error messages can be given to users in most cases when things do not work
out because of <code>wsproxy</code>. This is an SSH proxy command that communicates with converge
and provides additional information to the user. </li>
<li>A live screen showing the current sessions that are running. The sessions webpage provides
additional feedback about the running sessions. </li>
<li>Interactivity in the user's session with notifications about timeouts and a very
simple inactivity timeout mechanism. </li>
<li>Possibility for the user to define the remote shell to use. </li>
<li>Support for unix like bash shells and command prompt and powershell. </li>
<li>Observability w.r.t. non-functionals of converge and of agent and client sessions through
prometheus monitoring. For session monitoring, separate grafana dashboard is provided. </li>
</ul>
<p>
</p>
</div>
}
templ AboutTab() {
@BasePage(1) {
@About()
}
}