CAIA's Career Center is an easy-to-use, comprehensive resource connecting job seekers with employers in the growing AI field. Use your knowledge and credibility to advance your career or build a talented team for your organization. Opportunities targeted to CAIA Charterholders are prioritized.
In order to search for jobs specifically for CAIA Charterholders or those pursuing the CAIA Charter please enter “CAIA” in the search panel.
This will enable you to search for CAIA specific roles globally.
The Site Reliability Engineer will be part of a horizontal function that is responsible to ensure that the practices, processes and tools are in place to improve stability and the functionality of each application. This team will ensure the highest level of quality and success in support of technical issues, DR testing, and hardware/software updates. The SRE is expected to implement DevOps practices and automate the release process and develop scripts to automate the manual processes. As a Site Reliability Engineer for our technology teams, you will have the opportunity to build and maintain complex applications and also maintain vendor applications from development and risk perspective. Much of our support and software development focuses on optimizing existing systems, building infrastructure and reducing work through automation.
You'll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment you'll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE you'll be focused on running better production applications and systems.
Collaborate across Application Development, Product and production management to establish and maintain Service Level Objective (SLO), Service Level Indicator (SLI) and Error
Implement required telemetry and observability to monitor and measure the quality of service in real-time against the established SLO.
Design, code, test and deliver software to automate manual operational work
Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
Identify application patterns and analytics in support of better service level objectives
Design self-healing and resiliency patterns
Strong focus on automation and processes. Design, implement, improve and utilize key monitoring tools.Design, code, test and deliver software to automate manual operational work
Design automated software and product upgrades, change management, and release management solutions
Expertise in Incident, Problem and Change Management processes and tools
Manage, track and validate all changes to the Production, Disaster Recovery environment
Assist in priority incidents to quickly eliminate impacts
Escalate issues/Risks effectively when necessary across supporting framework
Ability to align IT service offerings with business strategies, goals, and objectives
Troubleshoot Key technical issues or escalate and work with appropriate technology teams to provide solutions.
Aggressively respond to service requests from Client facing support
Bachelor's degree, Business focused Masters or other advanced degree preferred or equivalent industry experience
7 years of financial services experience with a large financial services firm or advisory/consulting firm including experience as an internal auditor and or public accountant
Expertise in at least one technology stack designing, coding, testing, and delivering software
Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm
Working knowledge of infrastructure components. (E.g. routers, load balancers , cloud products , container systems , compute, storage and networks)
Excellent debugging and trouble shooting skills
Expert in performance monitoring and capacity management of large systems using various tools
Deep level expertise in instrumentation, customization and usage of modern monitoring toolset such as Dynatrace, AppDynamics, Grafana, Prometheus, ThousandEyes, Splunk, Geneos etc.
Expert in at least one technology stack (Java/J2EE/C#.NET) with designing, coding, testing, and delivering software
Exposure to Python and willing to be learn and be Expert in Python Technology for Creating Application Health Dashboards, Machine Learning Projects
Expert in at least one of the relational database (SQL Server, Oracle, DB2 etc.)
Working knowledge of Groovy, Batch scripting, Ansible, PowerShell or Shell Scripting
Working knowledge of infrastructure components like routers, load balancers and networks
Comfortable working in Agile mode and proficient in Continuous Integration and Continuous Delivery
Solid understanding of object oriented design methodologies