IT CONSULTING JOBS

Thursday, March 10, 2016

Job Title: Operations Reliability Engineer (3102016- BIAS- IOM-SANR)
Location: San Ramon, CA

Duration: Long Term

Job Function/Responsibilities:
As an Operations Reliability Engineer, you will be a key member of the ORE team, you will be working to improve the reliability and performance of our Operations. The ORE works as a first responder and is ultimately responsible for ensuring our cloud infrastructure services are up and running. You will work shoulder-to-shoulder with our engineering teams to deliver, build and operate the next generation of IaaS, PaaS and SaaS Cloud infrastructure and services, focusing on automation, availability and performance. You will diagnose and resolve latent and systemic reliability issues across entire stack: hardware, software, services, database, application and network. Drive standardization efforts across multiple disciplines and services.

Willing to roll up your sleeves and debug/tune/code/fix
Strong background and experience in scripting and automation
Represent the ORE organization in design reviews and operational readiness exercises for new and existing services with other teams.
Work with internal operational teams on driving availability, latency, scalability and efficiency of service/applications by instilling reliability into operational life cycle with a focus on fault tolerant approaches
Making sure the IaaS, PaaS & SaaS Cloud infrastructure and services platform meets or exceeds organization goals for availability, capacity, efficiency, scalability, and performance by engineering reliability into software and systems
Perform proactive daily system monitoring including reviewing system and application logs as well as responding to, triaging, troubleshooting and remediating incidents
Repair and recover from hardware or software failures. Coordinate and communicate with impacted stakeholders and clients, escalating where appropriate
Work closely with Infrastructure services, software support, security, development and engineering teams helping to build, maintain and extend the IaaS, PaaS & SaaS "live" services. Contribute in new and ongoing technology projects; Performance, High Availability and Scalability including partitioning, sharding, dynamic provisioning and de-provisioning of systems for current load, etc.
Review entire environment and execute initiatives to reduce failures, defects and improving overall performance.
Design, develop and execute automated tests to validate solutions and environments.
Monitor and troubleshoot issues across the entire stack - hardware, software, application and network.
Participate in performance analysis and tuning, service capacity planning and demand forecasting
Document current and future configuration processes and policies.
Assist with the implementation and development of SRE tools and applications
Manage and support SRE tools and applications
Participate in a 24x7 rotation for production issue escalations.

Qualifications & Requirements :

BS or MS degree in Computer Science, or a related field
3 - 5 years of experience administering Linux systems and infrastructure in a SaaS/Cloud production environment – AWS/Private Cloud
Good understanding of service orientation methodology
Strong working knowledge of networking, packet tracing, understanding latency and throughput.
Strong working knowledge of Linux operating systems, their underlying components, system statistics, performance tuning, filesystems and io.
Solid understanding of systems and application design, including the operational trade-offs of various designs
Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software service design practices
Specialist in at least 2-3 of the following: Pivtotal Cloud Foundry, OpenStack, Hadoop, Pivotal HD, HAWQ, MSQL, RabbitMQ, Redis, Jenkins, IaaS [Compute – Linux, Storage, Network - SDN – Juniper Contrail, Palo Alto Network FW, F5 load balancers]
Experience administering in customer-facing, high-availability, large scale environments.
Experience in one or more of the following languages: Shell, Python, PHP or Perl
Must have an understanding of building and managing large-scale systems and application architectures
Prior experience with configuration and maintenance of common applications such as Apache, MySQL, DHCP, SSH, DNS, etc.
Proficient in one or more of the following monitoring and logging tools: New Relic, App dynamics, Neustar, Gomez, Nimsoft, Zabbix, Nagios, Ganglia, Cacti, Splunk, Logstash, Graphite.
Working knowledge of Linux, TCP/IP, and web services
Prior experience with one or more of the following tools: Chef, Puppet, BOSH
Experience working in Agile environments
Solid verbal and written communication skills.

To Apply, Please click here: APPLY NOW

1 comment:

UnknownMarch 15, 2016 at 7:22 AM
Thanks for posting this. I wish it essay writer was able to be translated, but for some reason Google toolbar isn't working. I copy pasted it into another application and read the post.
ReplyDelete
Replies

Add comment

IT CONSULTING JOBS

Header

Thursday, March 10, 2016

1 comment:

My Blog List

Blog Archive

Popular Posts