إدارة وصيانة األنظمة
د/عبدالملك الحميري
Chapter1: Introduction to
System Administrator
2
Outline
System administrator
What does a sysadmin do?
Ethical issues
Administration Challenges
Bugs and emergent phenomena
The meta principles of system administration
System Administration of Datacenter
Types of Administrators/Users
3
A system administrator, or sysadmin, is a person who is
responsible for the upkeep, configuration, and reliable operation of
computer systems; especially multi-user computers, such as servers.
The system administrator seeks to ensure that the uptime,
performance, resources, and security of the computers he or she
manages meet the needs of the users, without exceeding the budget.
To meet these needs, a system administrator may acquire, install, or
upgrade computer components and software; provide routine
automation; maintain security policies; troubleshoot; train or supervise
staff; or offer technical support for projects.
4
User account management
Hardware management
Computers (Servers / Workstations)
Hardware (CPU, Memory, Storage, etc)
Network
Software management
Operating System
Application Software (Mail service, Web service, Business software, …)
Perform filesystem backups, restores
Install and configure new software and services
Keep systems and services operating
Monitor system and network
Troubleshoot problems
Maintain documentation
Audit security
Help users, performance tuning, and more!
5
The subject matter of system administration includes computer
systems and the way people use them in an organization. This entails
a Knowledge of Operating Systems and Applications, as well as
Hardware and Software troubleshooting, but also knowledge of the
purposes for which people in the organization use the computers.
6
User Ids
Mail
Home directories (quotas, drive capacities)
Default startup files (paths)
Permissions, group memberships, accounting and restrictions
Communicating policies and procedures
Disabling / removing user accounts
7
Capacity planning
Inventory
Hardware evaluation and purchase
Adding and removing hardware
Configuration
Cabling, wiring, DIP switches, etc.
Device driver installation
System configuration and settings
User notification and documentation
8
Perhaps most important aspect!
Disk and backup media capacity planning
Performance, network and system impact
Disaster recovery
Onsite/Offsite
Periodic testing
Multiple copies
User communication
Schedules, restore guarantees and procedures, loss tolerance
9
Evaluation of software
Downloading and building (compiling and tweaking)
Installation
Maintenance of multiple versions
Security
Patches and updates
User notification, documentation
10
Hardware and services functioning and operational
Capacity
Disk, RAM, CPU, network
Security
Passwords
Break-ins
System logs
Examination
Periodic rotation and truncation
11
Problem discovery, diagnosis, and resolution
Root cause analysis
Often quite difficult!
Often requires
Broad and thorough system knowledge
Outside experts
Luck
Expediency
12
Administrative policies and procedures
Backup media locations
Hardware
Location
Description, configuration, connections
Software
Install media (or download location)
Installation, build, and configuration details
Patches installed
13 Acceptable use policies
System logging and audit facilities
Evaluation and implementation
Monitoring and analysis
Traps, auditing and monitoring programs
Unexpected or unauthorized use detection
Monitoring of security advisories
Security holes and weaknesses
Live exploits
14
A system administrator’s responsibilities might include:
1. Appling operating system updates, patches, and configuration changes.
2. Installing and configuring new hardware and software.
3. Adding, removing, or updating user account information, resetting
passwords, etc.
4. System performance tuning.
A. Assess the problem and establish numeric values that categorize
acceptable behavior.
B. Measure the performance of the system before modification.
C. Identify the part of the system that is critical for improving the
performance. This is called the bottleneck.
D. Modify that part of the system to remove the bottleneck.
E. Measure the performance of the system after modification.
F. If the modification makes the performance better, adopt it. If the
modification makes the performance worse, put it back the way it was.
15
5. Responsibility for documenting the configuration of the system.
6. Responsibility for security
7. Performing routine audits of system and software.
8. Performing backups.
9. Analyzing system logs and identifying potential issues with computer
systems.
10. Troubleshooting any reported problems.
11. Introduction and integrating new technologies into existing data center
environments.
12. Answering technical queries.
13. Insuring that the network infrastructure is up and running
16
Because computer systems are human–computer communities, there
are ethical considerations involved in their administration. Even if
certain decisions can be made objectively, e.g. for maximizing
productivity or minimizing cost, one must have a policy for the use and
management of computers and their users. Some decisions have to
be made to protect the rights of individuals.
A system administrator has many responsibilities and constraints to
consider. Ethically, the first responsibility must be to the greater
network community, and then to the users them in the production
of real work.
17
System administration is not just about installing operating systems. It
is about planning and designing an efficient community of computers
so that real users will be able to get their jobs done. That means:
Designing a network which is logical and efficient.
Deploying large numbers of machines which can be easily
upgraded later.
Deciding what services are needed.
Planning and implementing adequate security.
Providing a comfortable environment for users.
Developing ways of fixing errors and problems which occur.
Keeping track of and understanding how to use the enormous
amount of knowledge which increases every year.
18
So, system administrator’s needs:
o Broad knowledge of hardware and software
o To balance conflicting requirements
o Short-term vs. long-term needs
o End-user vs. organizational requirements
o Service provider vs. police model
To work well and efficiently under pressure
24x7 availability
Flexibility, tolerance, and patience
Good communication skills
19
20
The term system clearly implies an operation that is systematic, or
predictable – but, unlike simple mechanical systems, like say a clock,
computers interact with humans in a complex cycle of feedback,
where uncertainty can enter at many levels. That makes human–
computer systems difficult to predict, unless we somehow fix the
boundaries of what is allowed, as a matter of policy.
Principle 1 (Policy is the foundation). System administration
begins with a policy – a decision about what we want and what
should be, in relation to what we can afford.
21
Principle 2 (Predictability). The highest level aim in system
administration is to work towards a predictable system.
Predictability has limits. It is the basis of reliability, hence trust and
therefore security.
Principle 3 (Scalability). Scalable systems are those that grow in
accordance with policy; i.e. they continue to function predictably,
even as they increase in size.
22
Duties of a Datacenter Engineer:
• Install, builds, upgrades, configuration, provisions and install
servers from scratch spanning several different platforms and
maintain servers as our customers grow.
• Performs basic monitoring, troubleshooting and repair all aspects of
servers, operating system and hardware.
• Responds to customer impacting events. Upgrades and
downgrades servers. Removes hardware from server software.
• Monitors all Electrical, Mechanical and Emergency Generator
system and reports issues as appropriate as long as the Facility
technician is away.
• Performs routine maintenance activities on customer server
environments.
• Assists in the planning of server environments.
• Participate in analyzing and summarizing Data Center technology
23 and critical services engineering agreements.
• Collaborate with other teams to ensure technology solutions and
business needs align
• Contribute to web site design and maintenance for vendor
Management team
• Support internal and external team communication plans, for
instance coordinating status reporting across the team, planning
team building activities, etc.
• Respond, action and resolve requests / faults logged both internally
and externally, and escalation of problems and issues to the
Operating Team.
• Report broken servers.
• Active monitoring of all servers.
• Work with Customer Operation Engineers in configuring customer
requirements.
• Work closely with 3rd parity providers.
24
In a larger company, following may all be separate positions within a
computer support or Information Services (IS) department. In a
smaller group they may be shared by a few System Administrators, or
even a single person.
Database Administrator
Network administrator
Security Administrator
Web Administrator
Technical support
Computer operator
25
A database administrator (DBA) maintains a database system, and
is responsible for the integrity of the data and the efficiency and
performance of the system.
26
A network administrator maintains network infrastructure as
switches and routers, and diagnoses problems with these or with the
behavior of network-attached computers.
27
A security administrators is a specialist in computer and network
security, including the administration of security devices such as
firewalls, as well as consulting on general security measures.
28
A web administrator maintains web server services (such as IIS or
Apache) that allow for internal or external access to web sites. Tasks
include managing multiple sites, administering security, and
configuring necessary components software. Responsibility may also
include software change management.
29
Technical support staff respond to individual users’ difficulties with
computer system, provide instructions and sometimes training, and
diagnose and solve common problems.
30
A computer operator performs routine maintenance and upkeep,
such as changing backup tapes or replacing failed drives in a RAID
array. Such tasks usually require physical presence in the room with
the computer; and while less skilled than sysadmin tasks require a
similar level of trust, since the operator has access to possibly
sensitive data.
31