¿Cómo puedo abrir un archivo de Excel en Python?

Question 1

¿Cómo abro un archivo que es un archivo de Excel para leer en Python?

Abrí archivos de texto, por ejemplo, sometextfile.txtcon el comando de lectura. ¿Cómo hago eso para un archivo de Excel?

Question 2

Editar:
en la versión más reciente de pandas, puede pasar el nombre de la hoja como parámetro.

file_name =  # path to file + file name
sheet =  # sheet name or sheet number or list of sheet numbers and names

import pandas as pd
df = pd.read_excel(io=file_name, sheet_name=sheet)
print(df.head(5))  # print first 5 rows of the dataframe

Consulte los documentos para ver ejemplos sobre cómo aprobar sheet_name:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Versión anterior:
también puede usar el pandaspaquete ...

Cuando trabaja con un archivo de Excel con varias hojas, puede usar:

import pandas as pd
xl = pd.ExcelFile(path + filename)
xl.sheet_names

>>> [u'Sheet1', u'Sheet2', u'Sheet3']

df = xl.parse("Sheet1")
df.head()

df.head() imprimirá las primeras 5 filas de su archivo de Excel

Si está trabajando con un archivo de Excel con una sola hoja, simplemente puede usar:

import pandas as pd
df = pd.read_excel(path + filename)
print df.head()

Question 3

Prueba la biblioteca xlrd .

[Editar] : por lo que puedo ver en tu comentario, algo como el fragmento de abajo podría ser suficiente. Supongo que solo está buscando en una columna la palabra 'john', pero podría agregar más o convertir esto en una función más genérica.

from xlrd import open_workbook

book = open_workbook('simple.xls',on_demand=True)
for name in book.sheet_names():
    if name.endswith('2'):
        sheet = book.sheet_by_name(name)

        # Attempt to find a matching row (search the first column for 'john')
        rowIndex = -1
        for cell in sheet.col(0): # 
            if 'john' in cell.value:
                break

        # If we found the row, print it
        if row != -1:
            cells = sheet.row(row)
            for cell in cells:
                print cell.value

        book.unload_sheet(name)

Question 4

Esto no es tan sencillo como abrir un archivo de texto sin formato y requerirá algún tipo de módulo externo ya que no hay nada integrado para hacer esto. Aquí tienes algunas opciones:

http://www.python-excel.org/

Si es posible, es posible que desee considerar exportar la hoja de cálculo de Excel como un archivo CSV y luego usar el módulo csv de python incorporado para leerlo:

http://docs.python.org/library/csv.html

Question 5

Ahí está el paquete openpxyl :

>>> from openpyxl import load_workbook
>>> wb2 = load_workbook('test.xlsx')
>>> print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']

>>> worksheet1 = wb2['Sheet1'] # one way to load a worksheet
>>> worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
>>> print(worksheet1['D18'].value)
3
>>> for row in worksheet1.iter_rows():
>>>     print row[0].value()

Question 6

Puede usar el paquete xlpython que solo requiere xlrd. Encuéntrelo aquí https://pypi.python.org/pypi/xlpython y su documentación aquí https://github.com/morfat/xlpython

Question 7

Esto puede ayudar:

Esto crea un nodo que toma una lista 2D (lista de elementos de la lista) y los empuja a la hoja de cálculo de Excel. asegúrese de que los IN [] estén presentes o lanzarán una excepción.

esta es una reescritura del nodo de dinamo de Excel de Revit para Excel 2013, ya que el nodo preempaquetado predeterminado seguía rompiéndose. También tengo un nodo de lectura similar. La sintaxis de Excel en Python es delicada.

thnx @CodingNinja - actualizado :)

###Export Excel - intended to replace malfunctioning excel node

import clr

clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
##AddReferenceGUID("{00020813-0000-0000-C000-000000000046}") ''Excel                            C:\Program Files\Microsoft Office\Office15\EXCEL.EXE 
##Need to Verify interop for version 2015 is 15 and node attachemnt for it.
from Microsoft.Office.Interop import  * ##Excel
################################Initialize FP and Sheet ID
##Same functionality as the excel node
strFileName = IN[0]             ##Filename
sheetName = IN[1]               ##Sheet
RowOffset= IN[2]                ##RowOffset
ColOffset= IN[3]                ##COL OFfset
Data=IN[4]                      ##Data
Overwrite=IN[5]                 ##Check for auto-overwtite
XLVisible = False   #IN[6]      ##XL Visible for operation or not?

RowOffset=0
if IN[2]>0:
    RowOffset=IN[2]             ##RowOffset

ColOffset=0
if IN[3]>0:
    ColOffset=IN[3]             ##COL OFfset

if IN[6]<>False:
    XLVisible = True #IN[6]     ##XL Visible for operation or not?

################################Initialize FP and Sheet ID
xlCellTypeLastCell = 11                 #####define special sells value constant
################################
xls = Excel.ApplicationClass()          ####Connect with application
xls.Visible = XLVisible                 ##VISIBLE YES/NO
xls.DisplayAlerts = False               ### ALerts

import os.path

if os.path.isfile(strFileName):
    wb = xls.Workbooks.Open(strFileName, False)     ####Open the file 
else:
    wb = xls.Workbooks.add#         ####Open the file 
    wb.SaveAs(strFileName)
wb.application.visible = XLVisible      ####Show Excel
try:
    ws = wb.Worksheets(sheetName)       ####Get the sheet in the WB base

except:
    ws = wb.sheets.add()                ####If it doesn't exist- add it. use () for object method
    ws.Name = sheetName



#################################
#lastRow for iterating rows
lastRow=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Row
#lastCol for iterating columns
lastCol=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Column
#######################################################################
out=[]                                  ###MESSAGE GATHERING

c=0
r=0
val=""
if Overwrite == False :                 ####Look ahead for non-empty cells to throw error
    for r, row in enumerate(Data):   ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
        for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
            if col.Value2 >"" :
                OUT= "ERROR- Cannot overwrite"
                raise ValueError("ERROR- Cannot overwrite")
##out.append(Data[0]) ##append mesage for error
############################################################################

for r, row in enumerate(Data):   ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
    for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
        ws.Cells[r+1+RowOffset,c+1+ColOffset].Value2 = col.__str__()

##run macro disbled for debugging excel macro
##xls.Application.Run("Align_data_and_Highlight_Issues")

Question 8

Este código funcionó para mí con Python 3.5.2. Se abre y guarda y sobresale. Actualmente estoy trabajando en cómo guardar datos en el archivo, pero este es el código:

import csv
excel = csv.writer(open("file1.csv", "wb"))

Question 9

import pandas as pd 
import os 
files = os.listdir('path/to/files/directory/')
desiredFile = files[i]
filePath = 'path/to/files/directory/%s'
Ofile = filePath % desiredFile
xls_import = pd.read_csv(Ofile)

¡Ahora puedes usar el poder de pandas DataFrames!